Topology Matters in Communication

Size: px

Start display at page:

Download "Topology Matters in Communication"

Jeffry Shawn Morrison
5 years ago
Views:

1 Electronic Colloquium on Computational Complexity, Report No Topology Matters in Communication Arkadev Chattopadhyay Jaikumar Radhakrishnan Atri Rudra May 4, 204 Abstract We provide the first communication lower bounds that are sensitive to the network topology for computing natural and simple functions by point to point message passing protocols for the Number in Hand model. All previous lower bounds were either for the broadcast model or assumed full connectivity of the network. As a special case, we obtain bounds of the form Ωk 2 n on the randomized communication complexity of many simple functions for a broad class of networks having k distributed nodes and each holding an n-bit input string. The best previous bounds were of the form Ωkn. The main tool that we use for deriving our bounds is a new connection with the theory of metric embeddings. This enables us to prove a variety of results that include the following: A distributed XOR lemma; a tight bound discarding poly-log factors on the randomized complexity of Element Distinctness that answers a question of Phillips, Verbin and Zhang SODA 2, [PVZ2] and new lower bounds for composed functions that were also left open in the work of Phillips et al. [PVZ2]. Finally, these bounds yield new topology-dependent bounds for several natural graph problems considered by Woodruff and Zhang DISC 3, [WZ3]. School of Technology and Computer Science, Tata Institute of Fundamental Research, arkadev.c@tifr.res.in. School of Technology and Computer Science, Tata Institute of Fundamental Research, jaikumar.radhakrishnan@gmail.com. Department of Computer Science and Engineering, University at Buffalo, SUNY, atri@buffalo.edu. Research supported in part by NSF grant CCF ISSN

2 Introduction Multi-party communication complexity was introduced in the work of Chandra, Furst and Lipton [CFL83] where k players have inputs X,..., X k {0, } n and the k players want to compute some common boolean function f : {0, } n k {0, } with the goal of minimizing the total communication between the k players. We assume that each player can only look at her own input, i.e. we follow the so called Number in Hand NIH model. The NIH multi-party model seems to have been first considered by Dolev and Feder [DF89]. The case for k = 2 is the standard two-party communication complexity introduced by Yao [Yao79]. Both two party and multi-party communication complexity has numerous applications: see e.g. the excellent book on this topic [KN97]. The generalization to the multi-party communication complexity model has to decide on various modes of communication:. Whether the communication is broadcast i.e. everyone sees message sent by a player or point to point messages have a single sender and single receiver; 2. If the communication is point to point, how are the player communication channels connected, i.e. what is the structure of the underlying graph topology G? For various reasons, the original model was with broadcast communication except for the early work of Duris and Rolim [DR98] who proved lower bounds on the deterministic and nondeterministic communication complexity in the point to point model over the complete graph. Recently, there has been a surge of interest in the point to point model [PVZ2, WZ2, WZ3, BEO + 3]. This is because the point to point model arguably better captures many of the modern day networks and has been studied in the many distributed models: e.g. the BSP model of Valiant [Val90], models for MapReduce [KSV0, GSZ], massively parallel models to compute conjunctive queries [BKS3, KS], distributed models for learning [BBFM2, IPSV2] and in the core distributed computing literature [DKO2]. The recent surge in interest in this model is also in part motivated by proving lower bounds for the distributed functional monitoring framework see e.g. the recent survey [Cor3], which generalizes the popular data streaming model [Mut05]. However, all of the recent work assumes that the underlying topology is fully connected 2. In our opinion this is a strict restriction since in many situations assuming full connectivity would be too strong an assumption. Indeed in areas like sensor networks, researchers have considered the effects of network topology with some success [HK2] for simple topologies like trees. The following is the motivating question for our work which was also mentioned as an interesting direction to pursue in [BEO + 3]: The paper of Chandra, Furst and Lipton [CFL83] considered the model where each player gets to see everyone else s input Number on Forehead or NOF model. To the best of our knowledge, the first paper with non-trivial randomized lower bounds in the NIH model was the work of Alon, Matias and Szegedy [AMS99]. 2 It is worthwhile to note that the effect of network topology on the cost of communication has been analyzed to quite an extent when the networks are dynamic in the context of distributed computing see for example the recent survey of Kuhn and Oshman [KO]. In this work in contrast, we are mainly concerned with static networks of arbitrary topology as embodied in the NIH model. 2

3 Can we prove lower bounds on multi-party communication complexity for the NIH point to point communication that are sensitive to the topology of the connections between the player? To see how the network topology can make a big difference in the total communication cost, let us consider the trivial algorithm that can compute any function f: all players send their input to one designated player. If the topology G is the complete graph or has constant diameter, then the trivial algorithm uses up Okn communication. Now consider the case when G is the line graph. In this case it is best for all players to send their input to the middle node. However, note that in this case the total communication is Ωk 2 n. 3 For general graphs, the total communication is bounded by the objective function of the -median problem where the distances are the shortest path distance in G. Thus, ignoring the topology as the current works do could result in bounds that are sub-optimal by a Θk factor. Our interest is in identifying situations where we can recover this extra Θk factor in our lower bounds and in general match the bound of the trivial algorithm for any topology. Our Contributions. Our main contribution is the first set of lower bounds for the NIH point to point communication model that are sensitive to the network topology. 4 We present a general framework to prove lower bounds for general topologies. Our framework is able to generalize many of the existing lower bound results for the complete graph topology and uses a new connection to the theory of metric embeddings. To the best of our knowledge this is the first work to apply results from metric embeddings to prove lower bounds on communication complexity. We would like to clarify that while none of our proofs are technically difficult by themselves, most of the tools we use are quite non-trivial. We believe our main contribution is more conceptual: we identify certain key components and show how to combine them to obtain topology dependent lower bounds. We also believe that our framework is fairly general and should be widely applicable. As a partial justification of this belief, we extend many known results on the complete graph to topology dependent, essentially tight, lower bounds for general graphs. A natural function to start proving strong lower bounds is for the set disjointness problem i.e. we want to compute n i= k j= X ji. Set disjointness is the canonical problem for two party communication complexity whose hardness implies lower bounds for myriads of problems in diverse models see for example the survey [CP0]. It was also recently shown by Braverman et al. [BEO + 3] that for the k-party set disjointness problem on the complete graph, the total communication is Ωkn. However, it is not too hard to see that for any topology, the intersection of the k sets and in particular the set disjointness problem as well as the union of the k sets can be computed with Okn total communication. 5 This implies 3 The two end point players have a total communication of kn, the next pair has total communication of k 2n and so on. 4 We note here that 2-party communication complexity lower bounds easily prove lower bounds of the form Ωd n, where d is the diameter of G. In this work, we present bounds of the form Ωkdn for some situations. 5 Consider a spanning tree of the underlying graph G and compute the intersection/union as one goes through all the nodes in the spanning tree say in pre-order traversal. It is easy to check that the total communication over each edge is On and that each edge needs to communicate only twice. 3

4 that existing reductions e.g. those in [WZ3] for graph problems from set disjointness and related problems cannot be used to prove topology dependent better lower bounds. Thus, we need a problem where the players do need to send all their information to one player. Towards this end, consider the following problem that we call Element-Distinctness: the players want to decide if X i X j for every i j [k]. If we allow randomization, the trivial algorithm on the line graph takes Õk2 amounts of communication. 6 In this case it does seem that all the pairs need to be compared and hence it seems like that the trivial algorithm is indeed optimal. We show that this is indeed the case for the Element-Distinctness problem as well as a bunch of other problems for all graph topology. Our Results. We show that for all of the following problems, the trivial algorithm is optimal up to poly log k factors unless mentioned otherwise for any network topology: Element-Distinctness: Output if and only if X i X j as vectors for every i j [k]. This answers a question of Philips et al. [PVZ2] who asked for the communication complexity of this function just for the case of G being a complete graph. In fact, [PVZ2] considered Element-Distinctness to be a variant of k-equality in which players output if and only if X i = X j for every i j [k]. They seemed to suggest that Element- Distinctness and Equality have the same complexity. We show that while for the complete graph they indeed have the same complexity, for general topologies the complexities of the two problems are entirely different. We prove the following XOR lemma. Consider any partition of [k] for even k into two disjoint set S and S and a bijection ρ between S and S. Let f : {0, } n {0, } n {0, }. Then computing the function XOR-f i S fx i, X ρi cannot be done better than the following trivial algorithm up to Õ k factor: every pair i, ρi for i S computes fx i, X ρi using the best two party communication protocol for f and then, say, players in S compute the final output bit. For certain functions we can improve the gap from Õ k to just poly log k factors. This extends the XOR lemma of Barak et al. [BBCR0] from the 2-party setting to the general multi-party setting 7. XOR lemmas are of general interest in computer science. It is natural to consider what happens if one replaces the XOR function with OR or AND. Woodruff and Zhang [WZ4] showed that OR-f has communication complexity Ω kr ɛ f for the complete graph. While our techniques recover this result for complete graphs, we observe that such a generic OR/AND-lemma cannot have a topology dependent extension to general graphs. However, for some specific functions of natural interest, we prove topology dependent tight bounds. These include OR-Equality, OR-Disjointness and AND-Disjointness, also known as the tribes function. Besides being interesting by themselves, these results are also useful in proving lower bounds for several graph problems described next. 6 There is no linear dependence on n since, the parties can just send fingerprints of their input to the designated player. 7 We use the result of Barak et al. to prove ours. 4

5 We extend the lower bounds on graph problems considered in [WZ3] to the general topology case. In particular, in these problems the k players get k subgraphs of some graph H edges can be duplicated and the players want to solve one of the following five problems: determining i the degree of a vertex in H, ii if H is acyclic; iii if H is triangle-free; iv if H is bipartite and v if H is connected. In all these cases we show that the trivial algorithm of all players sending their subgraphs to a designated player is the best possible up to poly log k factors. Our reductions for ii-v are different from those in [WZ3] as the hard problem in [WZ3] can be solved with Okn communication for any topology. In Section 4.5, we present some other results on composed functions that showcase the generality of our techniques. While composed functions are a natural and important class of functions that have been widely studied in communication complexity, their study in the context of point to point communication model was suggested in the recent work of Phillips et al. [PVZ2]. Our Techniques. We now present an overview of our proof techniques. As mentioned earlier, each of our steps is technically simple and it is the combination of non-trivial results that seems crucial to prove our stronger results. We believe that the technical simplicity provides for easy and wide applicability of our techniques to many problems. Later, we will also show that our techniques generalize most of the existing techniques used to prove lower bounds in the special case when G is completely connected. As usual for proving randomized lower bounds, we will prove a distributional lower bound for the problem. One way such lower bounds for 2-party problems are obtained is by proving a discrepancy/corruption bound for two dimensional rectangles/sub-matrices. For k players this would generalize to analyzing k-dimensional tensors that seem very challenging for large k. A more tractable option seems to try reducing the k-player problem to a hard 2-player problem by finding a convenient cut in the graph, where we give inputs on each side of the cut to a specific player. There are several obvious difficulties that come up when trying to pursue this option. We next sketch them and broadly describe how we get around these difficulties. Note that we cannot work with just a single cut to get better than Ωkn bounds. This is because each player in the reduced 2-party problem across such a cut gets Okn-length inputs. Thus, we have to work with a family of cuts such that across each or most of them we have a fairly hard 2-party problem. Optimistically, one may hope then that these complexities can be added up to take us beyond the kn barrier. There are two things to take care of immediately before one can try implementing this idea. The global distribution on k players inputs have to be chosen such that across every or at least most cut it becomes a hard distribution for the 2- party instance. Even if that happens, why are we allowed to add the distributional complexities of the various problems across cuts? Usually, the µ, ɛ-distributional complexity of a function f, denoted by D µ,ɛ f, is defined as the worst-case cost of the best deterministic protocol that errs with probability ɛ when inputs are sampled from µ. But then, we cannot add the costs of these various problems across cuts because the worst-case of each individual problem may not give rise to a globally consistent input. We get around this problem by considering the notion of expected cost for the 2-party 5

6 problem. Using linearity of expectation, one can now add the costs of the various 2-party problems. However, for technical reasons, we have to consider ɛ-error randomized expected cost wrt µ, i.e. a protocol that like a true randomized protocol errs with small error on every input but we measure its cost only w.r.t a distribution µ. This is a simple but subtle fix to the problem. To illustrate our idea with a concrete and simple example, let us consider the Element- Distinctness problem on the line graph. Consider the following distribution on X,..., X k : randomly pick them to be k distinct values. Note that by linearity of expectation, the total expected communication is the sum of the expected communication over each edge. Consider any edge e. Let us assume e is such that there are i k/2 players to the left of e and k i players to the right of e. Then note that any ɛ-error protocol for Element-Distinctness is solving the set disjointness problem among the sets {X,..., X i } and {X i+,..., X k } for every input with high probability. Ignoring the size of the domain of these values, this implies an Ωi lower bound on the communication on e. This is because our initial distribution is chosen so that the induced distribution for the 2-party set disjointness problem is such that every ɛ-error protocol has high expected cost w.r.t this distribution. Now just summing up the expected cost on each edge gives us an Ωk 2 lower bound on the total expected communication, which was our aim. The above argument crucially used the fact that the topology is a line graph. In particular, in the above argument when we considered an edge e, we basically used the fact that this induces a cut on the players which in turn induces a two-party set disjointness problem. The more crucial aspect that might have been swept under the rug was the following fact: if one considers the set of all k cuts, then each pair i, j is cut exactly i j times, which is the same as the distance between player i and j on the line graph. Moreover, each edge e appears in precisely one cut which ensures that the summing up of expected costs is a valid counting of the expected total cost. It turns out that for general graphs, we just need to find a set of cuts that has the property that every pair of players is separated as many times as up to some slack the shortest path distance between them in G. Further, to generalize the sum of expectation argument, we also need to ensure that every edge in G is not separated by many cuts. This is where the theory of metric embeddings plays a role. It turns out that one can find such cuts by known results on embedding metrics into l metric. For those unfamiliar with metric embeddings, the connection is not that surprising since embeddings into l and cuts have a very close relationship. In fact, for technical reasons we need the embedding to have a third property but that is also satisfied by known embeddings e.g. the one due to Bourgain. Once we have the above cut technology in place, we then need to select a global distribution of inputs such that the corresponding 2-party problems across cuts are hard wrt the expected cost measure over induced distributions across cuts. In most cases, we are able to appeal to a known 2-party result to finish off the argument. For instance, in the case of Element- Distinctness, the corresponding 2-party problem is k-set disjointness. For the XOR lemma, the induced problem exactly is the 2-party XOR problem lemma and we apply the result of Barak et al. [BBCR3]. 6

7 Connections to Related Work. Finally, we put our techniques in the context of existing techniques used to argue lower bounds for the case when G is fully connected. In particular, we will argue that our techniques essentially generalize many of the existing techniques. The first lower bounds for the message-passing NIH model seems to be due to Duris and Rolim [DR98]. They also considered the complete graph topology co-ordinator model and their bounds were for deterministic and non-deterministic complexity. In particular it uses a generalization of 2-party fooling set argument that does not seem to apply to bounded error randomized protocols. Very recently, the symmetrization technique was introduced by Phillips et al. [PVZ2] and was further developed by Woodruff and Zhang [WZ2, WZ3, WZ4]. At a very high level, the core idea in symmetrization is as follows. First we consider the case when G is a star graph of diameter 2 with a co-ordinator node at the center. Prototypical hard problems to consider are functions of the form k i= fx i, Y, where the center gets Y and the k leaves of G get X,..., X k. If ν is a hard distribution for the 2-party function f, then the trick is to define a hard distribution µ on X,..., X k, Y such that for every i [k] the effective distribution on X i, Y is ν. Then the argument, slightly re-phrased in our language, proceeds as follows: pick a random cut among the k cuts corresponding to the k edges. Then by definition of µ the induced 2-party problem across each cut is f and hence, the communication complexity is ΩRf, where Rf is the randomized two-party communication complexity of f. Then we note that since the cut was picked completely at random, and the distribution µ is symmetric with respect to the leaf-nodes, the communication across such a random cut in expectation is Θ/k of the total communication, which leads to an overall Ωk Rf lower bound on the total communication. By contrast, our technique does not need this symmetric property though our use of linearity of expectation seems similar. Indeed in Section 4.4, we show how to recover the lower bound on the OR of f from [WZ3] using our techniques. Note that the cuts in a star-graph are all similar, as all leaves are symmetric with respect to the prototypical example. As identified by the authors [PVZ2] themselves, this property seems to be lost even for star graphs when the inputs held by leaf-nodes are not symmetric with respect to the function that players want to compute. For general graph topology, there might be very little symmetry left. In particular, in our technique, the cuts obtained are arbitrary with no guarantee of symmetry. Nevertheless, our technique seems flexible enough to handle such cases. One technique that we cannot yet handle with ours is the result of Braverman et al. [BEO + 3] that proves a lower bound of Ωkn on the set disjointness problem. It is an interesting open question to see if we can port the techniques of Braverman et al. to our setting. 2 Preliminaries Let f be any k-variate boolean function of the form f : {0, } n k {0, } where each input X i takes value in {0, } n. Let G V, E be a graph with k vertices, i.e. V = {,..., k}. In the message passing game on the graph G for function f, there are k players of unbounded computational power. Player i is at vertex i of G and has access to only input X i. This distribution of inputs is often called the Number in Hand NIH model 8. The players want 8 There is another important multiparty communication model called the Number on Forehead model where Player i sees every input except X i. We do not consider this model at all in this work. 7

8 to compute f collaboratively according to a unanimously agreed upon communication protocol according to which players send and receive messages to and from other players. In any such protocol, Player i is allowed to communicate with Player j if and only if their vertices are neighbors in the graph G. Like in the standard two party communication game, what and to whom Player i communicates at any round, only depends on input X i and the messages received by Player i from other players until that round. Further, in contrast to the broadcast model, the message sent by Player i to j is only received by Player j and no other player. At termination of the protocol, each player should know the value fx,..., X k. The cost of an execution of the protocol is the total number of bits communicated on all the edges in all the rounds. Just as in standard two-party communication complexity, the protocols can be deterministic or randomized. All randomized protocols considered in this paper are public in the sense that players without communication share all public coin tosses. This is the most powerful model of randomness and thus lower bounds for this model imply lower bounds for weaker models. We also need two notions of the cost of a protocol, the worst-case and the average-case/expected cost with respect to a distribution over the input. For any fixed ɛ < /2, a randomized protocol makes ɛ error if on every input the probability over the random coin tosses of the protocol giving the wrong answer is at most ɛ. The worst-case cost of such a protocol Π, over the coin tosses and the inputs, is denoted by Cost Π. The randomized ɛ-error message passing complexity of a function f for a graph G, denoted by R ɛ,g f, is the worst-case cost of the best ɛ-error protocol. Protocols with ɛ = 0 are identified by a special term, called zero-error protocols. The zero-error complexity of a function f, denoted by R 0 f, is the worst-case expected cost of the best randomized protocol computing f with no error 9. Given a distribution µ over {0, } n k, the expected cost of a protocol Π, denoted by ECost µ Π is the expectation of its cost over both the internal random coin tosses of the protocol and the distribution µ. The µ-expected ɛ-error complexity of a function f over G, denoted by ER µ,ɛ,g f, is the expected cost of the best ɛ-error protocol over graph G for computing f. Naturally, R µ,0 f, denotes the µ-expected zero-error complexity of f. Let G V, E be a graph. A cut C is a partition of its set of vertices, V, into two parts A, B. A pair of vertices u, v V are separated by cut C if they lie in two different parts of the cut. The set of all pairs of vertices separated by C is denoted by MC. An edge in E is a cut-edge if its endpoints are separated by the cut. The set of cut-edges of C is denoted by EC. Given vertices u, v in graph G, we will use d G u, v or just du, v when G is clear from the context to denote the length of the shortest path between u and v in G. Throughout this paper, the underlying network graph G will be a connected graph. Let f be any k-party problem associated with the graph G, where k = V G and µ a distribution on the inputs to f. For any edge e EG, let ECost µ Π, e denote the expected total number of bits sent over e in both directions. Then, for any protocol Π and cut C of G, let ECost µ Π, C denote the expected total communication across C: ECostµ Π, C e EC ECost µ Π, e. Let C {C,..., C t } be a set of cuts. Define the expected cost of Π over C as ECost µ Π, C t i= ECost µ Π, Ci. We state below a simple but useful consequence of the linearity of expectation. 9 Note here the expectation is only over the internal coin tosses of a protocol. 8

9 Observation 2. Let C be a set of cuts of G such that any edge e of G appears as a cut edge ECost µ Π,C in at most m cuts in C. Then, ECost µ Π m. We will need the following results from basic 2-party communication complexity where the graph of communication is just an edge connecting Player and 2, often called Alice and Bob respectively. In general, k-party Set-Disjointness, denoted by k-disj, is defined as the following function: there is some universe [N] and Player i gets a subset X i of the universe. The function outputs if and only if there is no element that appears in each X i. The game in which the players are promised that the input sets further satisfy the condition X i = l and there is at most one element that is common to all X i, is denoted by k-udisj l. Let µ[l] be the distribution defined in the following way 0 on the inputs of 2 UDISJ l : with probability 3/4, you sample uniformly a pair of sets from the space of all pairs of non-intersecting sets, each of size l, and with probability /4 you sample uniformly a pair of sets that intersect precisely at one element. For notational convenience, we will often drop l from µ[l] when the context makes the value of l clear. Theorem 2.2 Razborov[Raz92] There exists some universal constants δ, β such that D µ,δ 2 UDISJ l is Ω βl, provided the size of the universe is at least 4l +. The following result will be useful for us for proving lower bounds. Lemma 2.3 Let ν be a distribution on the inputs of f, where f is any 2-party function. Let ν 0 ν be the marginal distribution on the zeroes ones of f induced by ν. Then, ER ν0,ɛ f ɛ ɛ D ν,ɛ f, where we have assumed ɛ < ɛ. Proof: Assume that there is an ɛ -erring randomized protocol Π with expected cost w.r.t ν 0 ν being c, where ɛ < ɛ. Consider the following new protocol Π : let ɛ d = ɛ ɛ. Π runs Π until c/ɛ d bits have been communicated or Π has halted. If Π has halted, Π outputs the answer of Π. Otherwise, Π halts and outputs 0. We claim the following is true: Pr r R,x ν [ Π x, r fx ] ɛ Let x be a zero one of f. Conditioned on this, x is being sampled from ν 0 ν. Hence, applying Markov s inequality, with probability less than ɛ d, Π does not output an answer within communicating c/ɛ d bits. Hence, [ Pr Π x, r = fx = 0 ] < ɛ d. r R,x ν Now consider the other case, where x is a one zero of f. Note that for every input Π makes error with probability at most ɛ. However, when x is a one zero of f, Π does not make an error if Π did not. Thus, [ Pr Π x, r = fx = ] < ɛ. r R,x ν 0 Razborov [Raz92] describes this distribution in another equivalent way that is more convenient for his analysis. 9

10 Combining these two cases immediately gives us our claim 2 The worst-case cost of Π is at most c/ɛ d. By fixing the randomness r of this Π, we get a deterministic protocol Π of cost at most c/ɛ c and that errs w.r.t ν at most with probability ɛ. Thus, c ɛ d D ν,ɛ f. Combining Theorem 2.2 with Lemma 2.3, the following corollary easily follows: let µ 0 [l] µ [l] be the uniform distribution on pairs of disjoint uniquely-intersecting sets, each of size l. When the context makes it clear, we drop l from the notation. Corollary 2.4 For each fixed ɛ < /2, there exists β such that ER µ0,ɛ 2 UDISJl and ER µ,ɛ 2 UDISJl are both at least β.l, if the size of the universe is at least 4l +. We derive the following direct but useful consequence of Lemma 2.3: Corollary 2.5 Let ν be a distribution on the inputs of f, where f is any 2-party function. Then, ER ν,ɛ f ɛ ɛ D ν,ɛ f, where we have assumed ɛ < ɛ. Proof: Using Lemma 2.3, we know that for any ɛ -error protocol Π for f, ECost νi Π ɛ ɛ D ν,ɛ f, for i = {0, }, where νi is the marginal of ν supported on points at which f evaluates to i. The corollary follows. Another function that is classical in communication complexity is Equality. We consider its natural k-party version, denoted by k EQ, which outputs if and only if all of its k-many n-bit input strings are equal. While EQ is relatively easy for bounded-error protocols, the cost of zero-error protocols is large under the following distribution: let S {0, } n let U k =,S be the uniform distribution on k tuples of equal strings from S. When S = {0, } n, we drop S from subscript of U. Whenever the value of k becomes obvious from the context, we drop the superscript of U. We outline a proof of the following classical result for the sake of completeness. Theorem 2.6 R U=,S 2-EQ = Ω log S. Proof: This uses a simple fooling set argument. Let µ U =,S. Let Π be a zero-error randomized protocol. Then, there exists a fixing a of the random coins of Π such that the deterministic protocol Π a has worst-case cost at most ECost µ Π. Since Π was a zero-error protocol for every input, Π a makes no errors as well. Hence, a standard fooling set argument shows, for every input in the support of µ, Π a must generate a unique transcript. Thus, the length of one of those transcripts must be at least log S. 2. Information Theoretic Techniques Information theory techniques have been increasingly used to prove lower bounds in communication complexity see for example [BYJKS04, JKS03, BBCR3, BEO + 3]. We will also use some of these techniques here. We quickly recall the basic notions needed here. Let X, Y, Z be discrete random variables taking values in some discrete set D. Then, the entropy of X, denoted by HX, is defined as follows: HX = x D Pr [ X = x ] log Pr[X = x]. 0

11 Informally, the entropy of X measures the uncertainty associated with it. Given two random variables X, Y, knowing the value of one may reduce the uncertainty associated with the other. More formally, the conditional entropy H X Y, is defined as follows: H X Y = E y H X Y = y, where H X Y = y is the entropy of the conditional distribution of X given Y = y. It can be shown that H X Y H X. The mutual information between X, Y, denoted by I X ; Y, is defined as follows: I X ; Y = H X H X Y. It is a non-trivial and useful property that mutual information is non-negative and symmetric. One can also define the conditional mutual information I X ; Y Z as follows: I X ; Y Z = H X Z H X Y Z. Given a 2-party randomized communication protocol Π and some input distribution µ, its external µ-information cost, denoted by IC µ Π, is defined as follows: IC µ Π = IX,Y µ X, Y ; ΠX, Y, where ΠX, Y is the random transcript of the protocol. Remark The transcript of an execution of a protocol will contain the concatenation of messages sent by each player along with the public random coin tosses of the protocol. The ɛ-error external information complexity of a function f wrt distribution µ, denoted by IC µ,ɛ f, is the µ-information cost of the best ɛ-error protocol for f. An input distribution µ is called product if it can be decomposed as the product of one distribution µ X on Alice s input and that of another one µ Y on Bob s input, i.e. µx, Y = µ X Xµ Y Y. Product distributions are convenient to analyze. However, the kind of distribution µ that we will need to analyze will not always be product, but a convex combination of product distributions. Such combinations are convenient to express in terms of an auxiliary random variable D. In particular, µ may be a non-product distribution for X, Y, D. However, the conditional distribution X, Y D = d will be product for every d. Towards analyzing the cost of protocols wrt such distributions, we need the slightly more general notion of conditional information cost of a protocol Π, denoted by CIC µ Π, and defined as follows: CIC µ Π = Iµ X, Y ; ΠX, Y D. This gives rise, just as in the case of information cost, to the notion of the ɛ-error conditional information complexity of a function f wrt µ, denoted by CIC µ,ɛ f. We will need a relationship to be established between the information complexity of a function and its expected bounded error randomized complexity. Towards that, let us recall a useful inequality that lower bounds the compressibility of a random variable by its entropy. The proof of this can be found in any standard text on information theory like [CT9].

12 Theorem 2.7 Theorem 5.3. in [CT9] The expected length L of any instantaneous q- ary code for a random variable X satisfies the following: L log q HX. We are now ready to make the connection between the two notions of complexity of a function: Theorem 2.8 For any distribution µ over the inputs to a function f, and ɛ <, the following is satisfied: ER µ,ɛ f = Ω IC µ,ɛ f. Proof: For any 2-party protocol Π, let us write its transcript as Π X, Y R, where R is the public coin tosses of Π and Π X, Y are the concatenation of messages sent by the protocol. Note that we may assume that either the protocol uses a prefix-free code over the binary alphabet or it uses a special character to delimit the messages sent by players to each other, making the encoding of transcript prefix-free over the alphabet of size 3. Using the chain rule of information, the information cost of Π can be re-written as follows: IC µ Π = Iµ X, Y ; R + Iµ X, Y ; Π X, Y R. But, I µ X, Y ; R = 0 as the public coin tosses are independent of X, Y. Hence, expanding the conditional information term, we have However, invoking Theorem 2.7, we get IC µ Π = ER [ Iµ X, Y ; Π X, Y R = r ]. ECost µ Π R = r log 3 H Π R = r log 3 I µ X, Y ; Π X, Y R = r. Now the claimed bound of our theorem easily follows. Now we state some results about the information complexity of functions which we will use. Let U, V, W be a triple of random variables sampled from {0, } 3 as follows: sample W uniformly at random from {0, }. If W = 0, fix U = 0 and sample V at random from {0, }. If W =, fix V = 0 and sample U at random. Call this distribution τ. Note that the conditional distribution U, V W = i, denoted by τ i, for any i {0, } is product. Let ν be the marginal distribution of U, V. Let X, Y, D τ n def η. Let µ be the marginal of η on X, Y, which is the same as ν n. Then, Bar-Yossef et al. obtained the following remarkable result: Theorem 2.9 Bar-Yossef et al. [BYJKS04] n IC µ,ɛ UDISJn CICη,ɛ UDISJn 2 ɛ. 4 2

13 The following simple corollary will be useful for us. Corollary 2.0 n ER µ,ɛ UDISJn 2 ɛ. 4 Proof: Easily follows by combining Theorem 2.9 and Theorem 2.8. We next consider another important function in communication complexity. This is the tribes function, defined as follows: TRIBES m,n X, Y def m i= DISJ n Xi, Y i, where X = X,..., X m, Y = Y,..., Y m and each X i, Y i {0, } n. In a 2-party game for tribes, Alice gets X and Bob gets Y. Let S = S,..., S m with each S i [n] and let D = D,..., D m with each D i {0, } n. Each X i, Y i, S i, D i are i.i.d. random variables sampled from a distribution γ. We sample U, V, S, D from γ as follows: sample S uniformly at random from [n] and D at random from {0, } n. For every j n, do the following: if j S, then sample U j, V j from τl where l = D j. If j = S, then set U j, V j =,. Note that the conditional distribution U, V S = s, D = d is product. The common marginal distribution for each X i, Y i is denoted by ρ. This implies that X, Y has distribution ρ m. We state the following result of Jayram et al. Theorem 2. Jayram et al. [JKS03] mn IC ρ m,ɛ TRIBESm,n 2 ɛ. 6 Just as before, we derive the following: Corollary 2.2 ER ρ m,ɛ mn TRIBESm,n 2 ɛ. 6 Proof: Easily follows by combining Theorem 2. and Theorem A set of special cuts Using Bourgain s theorem of embedding any graph metric with low distortion, we can derive the following: Theorem 3. Key Tool Let G be any graph with k vertices. Then there exists a set of cuts C that satisfies the following properties: Distributions τ 0 and τ were described just before Theorem

14 . Every pair of vertices u, v in G are separated by at least Ωlog k du, v many cuts in C. 2. Each edge in G appears as a cut-edge in at most Olog 2 k many cuts in C. We will prove the above theorem by first connecting the question about cuts to the problem of embedding a graph into l space. In particular, an embedding with specific properties immediately implies the required set of cuts: Lemma 3.2 Let G = V, E be a graph and f : V R D be a map for some dimension D that has the following properties: i For every u, v V, we have that fu fv α du, v; ii For every edge u, v E, we have that fu fv β; and iii For any dimension i [D], we have that the set {fu i u V } is the set {0,, 2,..., M} for some integer M. Then there exists a collections of cuts C such that. Every pair of vertices u, v in G are separated by at least α du, v many cuts in C. 2. Each edge in G appears as a cut-edge in at most β many cuts in C. We note that just conditions i and ii imply that the mapping f is an embedding with distortion β/α. We need property iii to construct the required set of cuts C. Next, we note that Bourgain s embedding of graphs and in particular, any metric into l satisfies properties i-iii in Lemma 3.2. Theorem 3.3 For any graph G with k vertices, there exists a mapping f such that it satisfies properties i-iii with α = Ωlog k and β = Olog 2 k. Note that Lemma 3.2 and Theorem 3.3 immediately implies Theorem 3.. In the rest of the section, we will prove Lemma 3.2 and outline why Bourgain s embedding proves Theorem 3.3. Proof of Lemma 3.2: Let f be a mapping that satisfies property iii. Next we define the set of cuts C and show that the number of cuts that separate any pair of vertices u, v V is exactly fu fv. Properties i and ii will then complete the proof. For every dimension i [D], we will define a family of cuts C i and the final set of cuts will be their union i [D] C i. Fix any i [D]. Let {fu i u V } = {0,, 2,..., M}. Then for every j [M] include the cut C ij = { u fu i < j 2} in Ci. By property iii, note that these cuts C ij are distinct for fixed i. To complete the proof, we need to argue that for every u, v V exactly fu fv many cuts in C separate u and v. Towards this end, note that for any fixed i [D], the number of cuts in C i that separate u and v is exactly fu i fv i this just follows from the construction of C i and property iii. Thus, the total number of cuts in C that separate u and v is exactly D fu i fv i = fu fv, i= 4

15 as desired. Theorem 3.3 without the added property iii is the usual statement of Bourgain s theorem for l embedding. Next we sketch why Bourgain s construction also satisfies property iii. Proof of Theorem 3.3: Bourgain s map f is defined as follows. Pick D = Olog 2 k random subsets of V. For a given u V and coordinate i [D] that corresponds to the random subset S, define fu i = du, S, i.e. the distance of u to the closest vertex in S. Since the graph G is unweighted and connected, it is easy to check that this construction would satisfy property iii. Indeed, consider a graph G where S is contracted into a super vertex s and s, u is an edge if and only if u, s is an edge for some s S. Now run BFS starting from s in G. Note that du, S is the level of u in the corresponding BFS tree. Further, by the fact that G is connected since G was connected, the fu i will take values in the set {0,,..., M} where there are M levels in the BFS tree. Property i and ii with β = D and α = Ωlog k for Bourgain s map is well-known. In particular, Lemma 6.3 in Lecture 3 from [R 06] proves ii and Theorem 6.4 in Lecture 3 from [R 06] proves property i. We end with two remarks. First we note that property iii is a bit stronger than what we need. Indeed, we can relax the condition that the distinct values in any dimension be consecutive integers to the following: all the consecutive values are separated by Θ. The in proof of Lemma 3.2, we would have that the number of cuts separating u and v would be Θ fu fv. This only affects the constants and thus, Theorem 3. would still be true. We chose the stronger version because it makes the proof a bit simpler and the fact that Bourgain s embedding already satisfies this stronger property. Second, one might wonder if one can bypass the embeddings connections but it is easy to check that a set of cuts as defined in Theorem 3. indeed defines an embedding of G into l and further, the ratio β/α = Ωlog k is true since this lower bound on distortion into l holds for expander graphs see e.g. Section 7 in Lecture 3 from [R 06]. 4 Application 4. Element Distinctness We prove an almost tight result on the randomized complexity of the Element-Distinctness function. Recall that this function outputs if and only if each of the k strings are distinct and no string is repeated. For any graph G and a vertex v, let v u V G d u, v. We call vertex c a center of G if c v, for every other vertex v. Let the diameter of G be denoted by DG. Theorem 4. Let G be any graph with k vertices and c a center. The bounded-error randomized k-party complexity of Element-Distinctness over G is Θ c, ignoring poly-logk factors. The zero-error randomized complexity of Element-Distinctness is Θ DG n + c, again ignoring poly-logk factors. 5

16 Proof: Let τ be the following distribution: randomly pick k distinct strings Z,..., Z k from {0, } n. Randomly assign them to the k nodes of G so that each node gets exactly one string. For the first part of the theorem, we first show that ER ɛ,τ ELMT-DIST is Ω c. Using Theorem 3., we obtain our set of cuts C. Let C i be any cut in C. Let Π be any randomized ɛ-error protocol over G for Element-Distinctness. The simple but useful claim is the following: let V 0 i and V i be the two sets of vertices separated by cut C i and l i = min { Vi 0, V i }. Claim 4.2 There is an ɛ-error randomized 2-party protocol solving UDISJ li w.r.t µ 0 [l i ] at most ECost τ Π, Ci. with expected cost Let us first show why this claim gives us our desired bound. The claim along with Corollary 2.4 immediately yields that ECost τ Π, Ci β li. Note that l i Vi 0 V /k. Hence, i t i= ECost τ Π, Ci β k t i= V 0 i V i. Observe that Vi 0 V i is exactly the number of pairs of vertices separated by cut C i. Using property of C from Theorem 3., we bound it further: t i= ECost τ Π, Ci β k u,v V G:u v du, v = β u β c. k Combining these with Observation 2. and Theorem 3., we get our bound as follows: ECost τ Π ECostτ Π, C /Olog k β c/olog k, where β just depends on ɛ. What remains to prove is Claim 4.2. W.l.o.g., let Vi 0 V i = m i 0. Consider any fixed assignment a of distinct strings to the first m i nodes of Vi 0. Let τ a denote the conditional distribution of inputs on the other nodes. We derive a 2-party ɛ-error protocol Π a for UDISJ li, where sets have elements from a universe of size 2 n m i. Alice and Bob, on getting two sets X, Y respectively from this universe with X = Y = l i, simulate Π as follows: Alice naturally encodes her set X as l i strings, each n-bit long and Bob does the same with his set Y. Alice assigns the encoded elements of her set and the fixed assignment a to nodes in Vi 0 and Bob assigns his elements to nodes in Vi. Then they simulate Π in the natural way, with Alice Bob communicating bits to Bob Alice whenever in Π a message is communicated along a cut-edge from Vi 0 Vi to V i Vi 0. Using properties of Π, it is easily verified that this is an ɛ-error protocol for UDISJ li. Observe that τ a is the same distribution as µ 0 [l i ]. Using Corollary 2.4, we immediately get ECost τ a Π, Ci = ECostµ0 [l i ] Πa βli, for each fixed a. Thus, we conclude that ECost τ Π, Ci βli, establishing Claim 4.2. The upper bound for the first part of the theorem follows from the natural fingerprinting algorithm. Every player sends a fingerprint of its input to the vertex c using Olog k sized u 6

17 hashes. The player at c then looks all the k 2 pairs of players and checks if the hashes of the corresponding inputs are the same or not. Note that this solves the problem as long as the collision probability of the hashes is O/k 2, which can be arranged to be true with Olog k sized hashes. For proving the bound on zero-error protocols, we need to consider another distribution. Let γ be the distribution on k inputs, each n-bit long, generated by the following sampling method: consider two vertices u, v in G such that du, v = DG. Let Z = {z,..., z k 2 } be a set of k 2 distinct strings and set S = {0, } n Z. Let M be the output of a random coin-toss. If M =, sample inputs from τ, else sample inputs from ν, where ν is given as follows: assign each vertex, other than u, v, one distinct string from Z and sample a string x at random from S and assign x to both X u and X v. Let Π be any zero-error protocol for Element-Distinctness. We next bound ECost γ Π : clearly, [ ] ECost γ Π = ECostτ Π + ECostν Π. 2 The first term is at least Ω c/ log k from the first part of the theorem. We now bound the second term on the RHS above. Using a standard breadth-first search tree, we generate t = DG many cuts, C,..., C t, such that each C i separates u and v and the set of cut-edges of C i and C j are disjoint, if i j. Thus, ECost ν Π = i ECost ν Π, Ci. We claim that ECost ν Π, Ci RU=,S 2 EQ. Given this claim, one immediately gets the desired bound on the RHS. To prove the claim, consider the following protocol Π for Alice and Bob to solve 2-EQ, given inputs x, y S: Alice and Bob simulate, according to Π, respectively the nodes on the sides of C i that have vertex u and v. Alice assigns x to X u and Bob assigns y to Y v. Then the follow Π assuming the other vertices got fixed inputs from Z. Clearly, this solves correctly 2-EQ for x, y S. Further, it easily follows ECost U=,2 Π = ECost ν Π, Ci = Ω log S, where the last step uses Theorem 2.6. This completes the argument for the lower bound. The upper bound for the second part of the theorem follows from the following modification of the protocol from the first part. In the first phase, each player sends a hash of size Olog nk to the player at vertex c. Using the hashes the player at vertex c will check which pairs of player can safely ruled to have distinct inputs. Call a pair of players that cannot be ruled out to have distinct inputs after the first phase to be surviving. In the second phase, the player at vertex c consider all surviving pairs i, j in some arbitrary order. For each surviving pair i, j, players i and j send their inputs to c. If X i = X j, then the protocol terminates with a 0. Otherwise the protocol moves to the next surviving pair. If all surviving pairs have distinct inputs then the algorithm terminates with a. It is easy to check that the protocol always terminates with the correct answer. To complete the proof we briefly argue the claimed communication upper bound. First we note that the protocol needs at most one pair i, j such that X i = X j to send their inputs to c with total communication ODG n. Now for every fixed pair i, j such that X i X j, the probability that it survives the first phase is O/nk 2. This implies that the expected communication between i, j and c in second phase is O/k 2. Thus, the total communication in the second phase for all the pairs with distinct inputs is O. The total communication for the first phase is Õ c from the same argument as in the upper bound 7

18 for the first part of the theorem. Adding up all the communication costs completes the proof. 4.2 Distributed XOR Lemma So far we have proved lower bounds that showed that the trivial algorithm of sending all inputs or hashes to one player was optimal. In this section, we will consider functions for which the trivial algorithm can potentially have a smaller communication cost. In particular, we will consider functions where players are paired up and one only needs to send information from one player to its matched player. However, there is no difference for the worst-case pairing: see Lemma 5. for a formal statement. To be more precise, let M be a disjoint pairing 2 of vertices in G = V, E with V = k where k is even and let f : {0, } n {0, } n {0, } be a boolean function. Consider the function f G,M : {0, } n k {0, } defined as follows: f G,M X u u V = u,v M fx u, X v, where denotes the XOR operator. Here is the trivial algorithm to compute f G,M : for each pair u, v M, the players corresponding to u and v run the best possible say randomized protocol to compute fx u, X v. Note that this would have communication cost at most a factor DM times the optimal communication complexity of computing f, where DM = du, v. u,v M Next, we show that the above in general, is tight to within a Õ k factor. Further, for functions f where the optimal lower bound on communication complexity of f can be proved via distributional complexity under a product distribution, the above trivial algorithm is tight with poly-logarithmic factors. Theorem 4.3 For every constant ɛ > 0 and every binary function f the following are true:. R ɛ,g f G,M DM Ω R 2ɛ f 2, k log k lognk where R γ f is the optimal 2-party communication complexity of f with randomized protocols that err with probability γ. 2. R ɛ,g f G,M R Ω 2ɛf 2 DM, poly lognk 2 We stress that the pairs u, v M need not be an edge. 8

Lower Bound Techniques for Multiparty Communication Complexity

Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand