On the communication and streaming complexity of maximum bipartite matching

Size: px

Start display at page:

Download "On the communication and streaming complexity of maximum bipartite matching"

Horatio McLaughlin
5 years ago
Views:

1 On the communication and streaming complexity of maximum bipartite matching Ashish Goel Michael Kapralov Sanjeev Khanna November 27, 2 Abstract Consider the following communication problem. Alice holds a graph G A = (P, Q, E A) and Bob holds a graph G B = (P, Q, E B), where P = Q = n. Alice is allowed to send Bob a message m that depends only on the graph G A. Bob must then output a matching M E A [ E B. What is the minimum message size of the message m that Alice sends to Bob that allows Bob to recover a matching of size at least ( ) times the maximum matching in G A [ G B? The minimum message length is the one-round communication complexity of approximating bipartite matching. It is easy to see that the one-round communication complexity also gives a lower bound on the space needed by a one-pass streaming algorithm to compute a( )-approximate bipartite matching. The focus of this work is to understand one-round communication complexity and one-pass streaming complexity of maximum bipartite matching. In particular, how well can one approximate these problems with linear communication and space? Prior to our work, only a -approximation was known for both these 2 problems. In order to study these questions, we introduce the concept of an -matching cover of a bipartite graph G, which is a sparse subgraph of the original graph that preserves the size of maximum matching between every subset of vertices to within an additive n error. We give a polynomial time construction of a -matching cover of size O(n) withsome 2 crucial additional properties, thereby showing that Alice and Bob can achieve a 2 -approximation with a message 3 of size O(n). While we do not provide bounds on the size of -matching covers for < /2, we prove that in general, the size of the smallest -matching cover of a graph G on n vertices is essentially equal to the size of the largest so-called -Ruzsa Szemerédi graph on n vertices. We use this connection to show that for any >, a ( 2 + )-approximation requires a communication complexity 3 of n + (/ log log n). Departments of Management Science and Engineering and (by courtesy) Computer Science, Stanford University. ashishg@stanford.edu. Research supported in part by NSF award IIS Institute for Computational and Mathematical Engineering, Stanford University. kapralov@stanford.edu. Research supported in part by NSF award IIS and a Stanford Graduate Fellowship. Department of Computer and Information Science, University of Pennsylvania, Philadelphia PA. sanjeev@cis.upenn.edu. Supported in part by NSF Awards CCF-696 and IIS We also consider the natural restrictingon of the problem in which G A and G B are only allowed to share vertices on one side of the bipartition, which is motivated by applications to one-pass streaming with vertex arrivals. We show that a 3 -approximation can be achieved with a linear size 4 message in this case, and this result is best possible in that super-linear space is needed to achieve any better approximation. Finally, we build on our techniques for the restricted version above to design one-pass streaming algorithm for the case when vertices on one side are known in advance, and the vertices on the other side arrive in a streaming manner together with all their incident edges. This is precisely the setting of the celebrated ( )-competitive randomized algorithm of Karp-Vazirani-Vazirani (KVV) for the online bipartite matching problem [2]. We present here the first deterministic one-pass streaming ( )-approximation e algorithm using O(n) space for this setting. Introduction We study the communication and streaming complexity of the maximum bipartite matching problem. Consider the following scenario. Alice holds a graph G A = (P, Q, E A ) and Bob holds a graph G B = (P, Q, E B ), where P = Q = n. Alice is allowed to send Bob a message m that depends only on the graph G A. Bob must then output a matching M E A [ E B. What is the minimum size of the message m that Alice sends to Bob that allows Bob to recover a matching of size at least of the maximum matching in G A [ G B?The minimum message length is the one-round communication complexity of approximating bipartite matching, and is denoted by CC(, n). It is easy to see that the quantity CC(, n) also gives a lower bound on the space needed by a one-pass streaming algorithm to compute a( )-approximate bipartite matching. To see this, consider the graph G A [ G B revealed in a streaming manner with edge set E A revealed first (in some arbitrary order), followed by the edge set E B. It is clear that any non-trivial approximation to the bipartite matching problem requires (n) communication and (n) space, respectively, for the one round communication and one- e

2 pass streaming problems described above. The central question considered in this work is how well can we approximate the bipartite matching problem when only Õ(n) communication/space is allowed. Matching Covers: We show that a study of these questions is intimately connected to existence of sparse matching covers for bipartite graphs. An -matching cover or simply an -cover, of a graph G(P, Q, E) is a subgraph G (P, Q, E ) such that for any pairs of sets A P and B Q, the graph G preserves the size of the largest A to B matching to within an additive error of n. The notion of matching sparsifiers may be viewed as a natural analog of the notion of cut-preserving sparsifiers which have played a very important role in the study of network design and connectivity problems [, 4]. It is easy to see that if there exists an -cover of size f(, n) for some function f, then Alice can just send a message of size f(, n) to allow Bob to compute an additive n error approximation to bipartite matching (and ( )- approximation whenever G A [ G B contains a perfect matching). However, we show that the question of constructing e cient -covers is essentially equivalent to resolving a long-standing problem on a family of graphs known as the Ruzsa-Szemerédi graphs. A bipartite graph G(P, Q, E) is an -Ruzsa-Szemerédi graph if E can be partitioned into a collection of induced matchings of size at least n each. Ruzsa-Szemerédi graphs have been extensively studied as they arise naturally in property testing, PCP constructions and additive combinatorics [7,, 7]. A major open problem is to determine the maximum number of edges possible in an -Ruzsa-Szemerédi graph. In particular, do there exist dense graphs with large locally sparse regions (i.e. large induced subgraphs are perfect matchings)? We establish the following somewhat surprising relationship between matching covers and Ruzsa-Szemerédi graphs: for any > the smallest possible size of an -matching cover is essentially equal to the largest possible number of edges in an -Ruzsa-Szemerédi graph. Constructing dense -Ruzsa-Szemerédi graphs for general and proving upper bounds on their size appears to be a di cult problem [9]. To our knowledge, there are two known constructions in the literature. The original construction due to Ruzsa and Szemerédi yields a collection of n/3 induced matchings of size n/2 O(p log n) using Behrend s construction of a large subset of {,...,n} without three-term arithmetic progressions [3, 7]. Constructions of a collection of n c/ log log n induced matchings of size n/3 o(n) were given in [7, 5]. We use the ideas of [7, 5] to construct ( + (/ log log n) 2 )-Ruzsa-Szemerédi graphs with n edges and a more general construction for the vertex arrival case. To the best of our knowledge, the only known upper bound on the size of -Ruzsa-Szemerédi graphs for constant < 2 is O(n2 / log n) that follows from the bound used in an elementary proof of Roth s theorem [7]. One-round Communication: We show that in fact CC(, n) apple 2n for all 3, i.e. a message of linear size su ces to get a 2 3-approximation to the maximum matching in G A [ G B. We establish this result by constructing an O(n) size 2-cover of the input graph that satisfies certain additional properties which allows Bob to recover a 2 3 -approximation. We refer to this particular 2-cover as a matching skeleton of the input graph, and give a polynomial time algorithm for constructing it. Next, building on the abovementioned connection between matching covers and Ruzsa-Szemerédi graphs, we show the following two results: (a) our construction of 2-cover implies that for any >, there do not exist ( 2 + )-Ruzsa- Szemerédi graph with more than O(n/ ) edges, and (b) our 2 3-approximation result is best possible when only linear amount of communication is allowed. In particular, Alice needs to send n + (/ log log n) bits to achieve a ( )-approximation, for any constant >, even when randomization is allowed. We then study the one round communication complexity CC v (, n) of ( )-approximate maximum matching in the restricted model when the graphs G A and G B are only allowed to share vertices on one side of the bipartition. This model is motivated by application to one-pass streaming computations when the vertices of the graph arrive together with all incident edges. We obtain a stronger approximation result in this model, namely, using the preceding 2-cover construction we show that CC v (, n) apple 2n for /4. Thus a 3 4-approximation can be obtained with linear communication complexity, and as before, we show that obtaining a better approximation requires a communication complexity of n + (/ log log n) bits. One-pass Streaming: We build on our techniques for one-round communication to design a one-pass streaming algorithm for the case when vertices on one side are known in advance, and the vertices on the other side arrive in a streaming manner together with all their incident edges. This is precisely the setting of the celebrated ( e )-competitive randomized algorithm of Karp-Vazirani-Vazirani (KVV) for the online bipartite matching problem [2]. We give a deterministic one-pass streaming algorithm that matches the ( e )-approximation guarantee of KVV using only We note here that a maximum matching in a graph is only a 2 3 -cover.

3 O(n) space. Prior to our work, the only known deterministic algorithm for matching in one-pass streaming model, even under the assumption that vertices arrive together with all their edges, is the trivial algorithm that keeps a maximal matching, achieving a factor of 2. We note that in the online setting, randomization is crucial as no deterministic online algorithm can achieve a competitive ratio better than 2. Related work: The streaming complexity of maximum bipartite matching has received significant attention recently. Space-e cient algorithms for approximating maximum matchings to factor ( ) in a number of passes that only depends on / have been developed. The work of [4] gave the first space-e cient algorithm for finding matchings in general (non-bipartite) graphs that required a number of passes dependent only on /, although the dependence was exponential. This dependence was improved to polynomial in [5], where ( )-approximation was obtain in O(/ 8 ) passes. In a recent work, [2] obtained a significant improvement, achieving ( )-approximation in O(log log(/ )/ 2 ) passes (their techniques also yield improvements for the weighted version of the problem). Further improvements for the non-bipartite version of the problem have been obtained in []. Despite the large body of work on the problem, the only known algorithm for one pass is the trivial algorithm that keeps a maximal matching. No non-trivial lower bounds on the space complexity of obtaining constant factor approximation to maximum bipartite matching in one pass were known prior to our work (for exact computation, an (n 2 ) lower bound was shown in [6]). Organization: We start by introducing relevant definitions in section 2. In section 3 we give the construction of the matching skeleton, which we use later in section 4 to prove that CC(/3,n)=O(n), as well as show that the matching skeleton forms a /2-cover. In section 5 we deduce using the matching skeleton that CC v (/4,n)=O(n). In section 6 we use these techniques to obtain a deterministic one-pass ( /e) approximation to maximum matching in O(n) space in the vertex arrival model. We extend the construction of Ruzsa-Szemerédi graphs from [7, 5] in section 7. We use these extensions in section 8 to show that our upper bounds on CC(, n) and CC v (, n) are best possible, as well as to prove lower bounds on the space complexity of one-pass algorithms for approximating maximum bipartite matching. Finally, in section 9 we prove the correspondence between the size of the smallest -matching cover of a graph on n nodes and the size of the largest -Ruzsa-Szemerédi graph on n nodes. 2 Preliminaries We start by defining bipartite matching covers, which are matchings-preserving graph sparsifiers. Definition 2.. Given an undirected bipartite graph G =(P, Q, E), and sets A P, B Q, and H E, let M H (A, B) denote the size of the largest matching in the graph G =(A, B, (A B) \ H). Given an undirected bipartite graph G =(P, Q, E) with P = Q = n, a set of edges H E is said to be an -matching-cover of G if for all A P, B Q, we have M H (A, B) M E (A, B) n. Definition 2.2. Define L C (, n) to be the smallest number m such that any undirected bipartite graph G =(P, Q, E) with P = Q = n has an -matching-cover of size at most m. We next define induced matchings and Ruzsa- Szemerédi graphs. Definition 2.3. Given an undirected bipartite graph G =(P, Q, E) and a set of edges F E, let P (F ) P denote the set of vertices in P which are incident on at least one edge in F, and analogously, let Q(F ) denote the set of vertices in Q which are incident on at least one edge in F. Let E(F ), called the set of edges induced by F, denote the set of edges E \ (P (F ) Q(F )). Note that E(F ) may be much larger than F in general. Given an undirected bipartite graph G =(P, Q, E), a set of edges F E is said to be an induced matching if no two edges in F share an endpoint, and E(F )=F. Given an undirected bipartite graph G = (P, Q, E) and a partition F of E, the partition is said to be an induced partition of G if every set F 2F is an induced matching. An undirected bipartite graph G =(P, Q, E) with P = Q = n is said to have an -induced partition if there exists an induced partition of G such every set in the partition is of size at least n. Following [7], we refer to graphs that have an -induced partition as - Ruzsa-Szemerédi graphs. Definition 2.4. Let U I (, n) denote the largest number m such that there exists an undirected bipartite graph G =(P, Q, E) with E = m, P = Q = n, andwith an -induced partition. Note that for any < < 2 <, any 2 -induced partition of a graph is also an -induced partition, and hence, U I (, n) is a non-increasing function of. Analogously, any -matching-cover is also an 2 - matching cover, and hence, L C (, n) is also a nonincreasing function of.

4 3 Matching Skeletons Let G =(P, Q, E) be a bipartite graph. We now define a subgraph G =(P, Q, E ) of G that contains at most ( P + Q ) edges, and encodes useful information about matchings in G. We refer to this subgraph G as a matching skeleton of G, and this construction will serve as a building block for our algorithms. Among other things, we will show later that G is a 2-cover of G. We present the construction of G in two steps. We first consider the case when P is hypermatchable, that is, for every vertex v 2 Q there exists a perfect matching of the P side that does not include v. We then extend the construction to the general case using the Edmonds- Gallai decomposition [6]. 3. P is hypermatchable in G We note that since P is hypermatchable, by Hall s theorem [6], we have that (A) > A for all A P. For a parameter 2 (, ], let R G () ={A P : G(A) apple(/) A }. Note that as the parameter decreases, the expansion requirement in the definition above increases. Wewill omit the subscript G when G is fixed, as in the next lemma. Lemma 3.. Let 2 (, ] be such that R(+ ) =; for any >, i.e. G supports an + -matching of the P - side for any >. Then for any two A 2R(),A 2 2 R() one has A [ A 2 2R(). Proof. Let B = (A ) and B 2 = (A 2 ). First, since (A (Q \ B )) \ E = ; and (A 2 (Q \ B 2 )) \ E = ;, we have that (A \A 2 ) (Q\(B \B 2 )) = ;. Furthermore, since R( + ) =;, one has B \ B 2 (/) A \ A 2. Also, we have B i apple A i /, i =, 2. Hence, B [ B 2 = B + B 2 B \ B 2 apple (/)( A + A 2 A \ A 2 )=(/) A [ A 2, and thus (A [ A 2 ) 2R() as required. We now define a collection of sets (S j,t j ),j =,...,+, wheres j P, T j Q, S i \ S j = ;,i6= j.. Set j :=, G := G, :=. We have R G ( )= ;. 2. Let < j be the largest real such that R Gj ( ) 6= ;. 3. Let S = S A2R( ) A, and T = (S ). We have S 2R Rj ( ) by Lemma Let G j := G j \ (S [ T ). We refer to the value of at which a pair (S,T ) gets removed from the graph as the expansion of the pair. Set S j := S,T j := T, j :=.IfG j 6= ;, letj := j+ and go to (2). The following lemma is an easy consequence of the above construction. Lemma For each U S j one has G j (U) (/ j ) U. 2. For every [ S j \ [ T j AA \ E = ;. japplek japplek Proof. We prove () by contradiction. When j =, () follows immediately since we are choosing the largest such that R( ) 6= ;. Otherwise suppose that there exists U P Gj such that G j (U) < (/ j ) U. Then first observe that G j (U) > (/ j ) U. If not then G j (S j [ U) = T j + G j (U) apple ( S j + U ) apple ( S j [ U ), j j since S j \ P Gj = ; by construction. Now as j < j is chosen to be the largest real for which there exists some subset U P Gj with G j (U ) apple (/ j ) U, it follows that for every U P Gj,wemust have G j (U) (/ j ) U. (2) follows by construction. To complete the definition of the matching skeleton, we now identify the set of edges of G that our algorithm keeps. For a parameter and subsets S P, T Q we refer to a (fractional) matching M that saturates each vertex in S exactly times (fractionally) and each vertex in T at most once as a -matching of S in (S, T, (S T ) \ E). By Lemma 3.2 there exists a (fractional) (/ j )-matching of S j in (S j,t j, (S j T j ) \ E). Moreover, one can ensure that the matching is supported on the edges of a forest by rerouting flow along cycles. Let M j be a fractional (/ j )-matching in (S j,t j ) that is a forest. Interestingly, the fractional matching corresponding to the matching skeleton is identical to a -majorized fractional allocation of unit-sized jobs to ( ) machines [3, 8]; as a result, the fractional matchings x e simultaneously minimize all convex functions of the x e s subject to the constraint that every node in P is matched exactly once.

5 3.2 General bipartite graphs We now extend the construction to general bipartite graphs using the Edmonds-Gallai decomposition of G(P, Q, E), which essentially allows us to partition the vertices of G into sets A P (G), D P (G), C P (G), A Q (G), D Q (G), and C Q (G) such that A P (G) is hypermatchable to D Q (G), A Q (G)is hypermatchable to D P (G), and there is a perfect matching between C P (G) and C Q (G). The Edmonds-Gallai decomposition theorem is as follows. Theorem 3.. (Edmonds-Gallai decomposition, [6]) Let G =(V,E) be a graph. Then V can be partitioned into the union of sets D(G),A(G),C(G) such that D(G) ={v 2 V 9 a maximum matching missing v} A(G) = (D(G)) C(G) =V \ (D(G) [ A(G)). Moreover, every maximum matching contains a perfect matching inside C(G). Applying Edmonds-Gallai decomposition to bipartite graphs, we get Corollary 3.. Let G = (P, Q, E) be a graph. Then V can be partitioned into the union of sets D P (G),D Q (G), A P (G),A Q (G), C P (G),C Q (G) such that D P (G) ={v 2 P 9 a maximum matching missing v} D Q (G) ={v 2 Q 9 a maximum matching missing v} A P (G) = A Q (G) = (D Q (G)) (D P (G)) C P (G) =P \ (D P (G) [ A P (G)) C Q (G) =Q \ (D Q (G) [ A Q (G)). Moreover,. there exists a perfect matching between C P (G) and C Q (G) 2. for every U A P (G) one has (U)\D Q (G) > U 3. for every U A Q (G) one has (U) \ D P (G) > U. Proof. () is part of the statement of Theorem 3.. To show (2), note that by definition of D Q (G) for each vertex v 2 D Q (G) there exists a maximum matching that misses v. Thus, (U) \ D Q (G) > U for every set U. Using the above partition, we can now define a matching skeleton of G. Let S = C P (G),T = C Q (G), and let M be a perfect matching between S and T. Let (S,T ),..., (S j,t j ) be the expanding pairs obtained by the construction in the previous section on the graph induced by A P (G) [ D Q (G). Let (S j,t j ),...,(S,T ) be the expanding pairs obtained by the construction in the previous section from the Q side on the graph induced by A Q (G) [ D P (G). Definition 3.. For a bipartite graph G =(P, Q, E) we define the matching skeleton G of G as the union of pairs (S j,t j ),j =,...,+, with corresponding (fractional) matchings M j. Note that G contains at most P + Q edges. As before, we can show the following: Lemma For each U S j, one has T j \ G (U) (/ j) U. 2. For every k>, P \ S S j k S j j k T j \ E = ;, and Q \ S S japple k S j japple k T j \ E = ;. Proof. Follows by construction of G. We note that the formulation of property (2) in Lemma 3.3 is slightly dierent from property (2) in Lemma 3.2. However, one can see that these formulations are equivalent when there are no (S j,t j ) pairs for negative j, as is the case in Lemma O(n) communication protocol for CC( 3,n) In this section, we prove that for any two bipartite graphs G,G 2, the maximum matching in the graph G [ G 2 is at least 2/3 of the maximum matching in G [ G 2, where G is the matching skeleton of G. Thus, CC(, n) =O(n) for all /3; Alice sends the matching skeleton G A of her graph, and Bob computes a maximum matching in the graph G A [ G B. Before proceeding, we establish some notation used for the next several sections. Denote by (S j,t j ),j =,...,+ the set of pairs from the definition of G. Recall that S j P when j and S j Q when j<. Also, given a maximum matching M in a bipartite graph G = (P, Q, E), a saturating cut corresponding to M is a pair of disjoint sets (A [ B,A 2 [ B 2 ) such that A [ A 2 = P, B [ B 2 = Q, all vertices in A 2 [ B are matched by M, there are no matching edges between A 2 and B, and no edges at all between A and B 2. The existence of a saturating cut follows from the maxflow min-cut theorem. Let ALG denote the size of the maximum matching in G [G 2 and let OPT denote the size of the maximum matching in G [ G 2.

) * # ) " # + & $ + # $! " # & " # ' %" # $ %" #! " # ' %" #! "# $ ) ( % & ' $ "# $ ' $ %, $ ( ' $ ( # $ ' " #! %" # $ " # ' " #! %" # & %" # ( "# $! ' $ ), $ % # ' $!

$corresponding saturating cut (A [ B,A 2 [ B 2 ); note that ALG = B + A 2. Let M be a maximum matching in E \ (A B 2 ). Note that we have OPT apple B + A 2 + M.$

6 ) * # ) " # + & $ + # $! " # & " # ' %" # $ %" #! " # ' %" #! "# $ ) ( % & ' $ "# $ ' $ %, $ ( ' $ ( # $ ' " #! %" # $ " # ' " #! %" # & %" # ( "# $! ' $ ), $ % # ' $! ' $ ( "# $ % "# $ ( * # ( " # * & $ * # $ Figure : Distribution of (S j,t j ) pairs across the cut Figure 2: Matching of the (S,T ) pair Consider a maximum matching M in (G [ G 2 ) and a corresponding saturating cut (A [ B,A 2 [ B 2 ); note that ALG = B + A 2. Let M be a maximum matching in E \ (A B 2 ). Note that we have OPT apple B + A 2 + M. We start by describing the intuition behind the proof. Suppose for simplicity that the matching skeleton G of G consists of only one (S j,t j ) pair for some j, such that T j = (/ j ) S j. We first note that since the matching M is not part of the matching skeleton, it must be that edges of M go from S j to T j. We will abuse notation slightly by writing M \ to denote, for P [ Q, the subset of nodes of that are matched by M. Since all edges of M go from S j to T j,wehavem \ A S j \ A and M \ B 2 T j \ B 2. This allows us to obtain a lower bound on B and A 2 in terms of M if we lower bound B and A 2 in terms of S j \ A and T j \ B 2 respectively. First, we have that B G (S j \ A ) (/ j ) S j \ A (/ j ) M, where we used the fact that the saturating cut is empty in G [ G 2 and Lemma 3.3. Next, we prove that G (S j \ A 2 ) \ B 2 apple(/ j ) S j \ A 2 (this is proved in Lemma 4.2 below). This, together with the fact that M \ B 2 T j \ B 2 = G (S j \ A 2 ) \ B 2, implies that A 2 j M. Thus, we always have A 2 + B ( j +/ j ) M, and hence the worst case happens at j =, i.e. when the matching skeleton G of G consists of only the (S,T ) pair, yielding a 2/3 approximation. The proof sketch that we just gave applies when the matching skeleton only contains one pair (S j,t j ). In the general case, we use Lemma 3.3 to control the distribution of M among dierent (S j,t j ) pairs. More precisely, we use the fact that edges of M may go from S j \ A to T i \ B 2 only if i apple j. Another aspect that adds complications to the formal proof is the presence of (S j,t j ) pairs for negative j. and We will use the notation Sj \ A Z j, j > S j \ B 2, j <. W j Tj \ B 2, j > T j \ A, j < for the vertices in P and Q that are matched by M (see Fig. ). Further, let Z denote the set of vertices in S \ A that are matched by M to B 2 \ T, and let W = M (Z ) B 2 \ T. Let W S \ A denote the vertices in S \ A that are matched by M outside of T. Similarly, let W 2 T \ B 2 denote the vertices in T \ B 2 that are matched by M outside of S (see Fig. 2). Let B := B G (Z ) [ G (W ) [ [ j> A 2 := A 2 G (W ) [ G (W 2 ) [ [ j< Then since G (Z j) [ S j G (Z j) [ S j A A. OPT apple B + A 2 + M +( B \ B + A 2 \ A 2 ) ALG = B + A 2 +( B \ B + A 2 \ A 2 ), it is su cient to prove that ( B + A 2 ) (2/3)( B + A 2 + M ). Let OPT = B + A 2 + M and ALG = B + A 2. Define =(OPT ALG )/OP T. We will now define variables to represent the sizes

7 of the sets used in defining B, A 2: w = W,w 2 = W 2, z = Z,w = W, (Note that z = w ) = Z j,w j = W j,r j = G (Z j), Sj \ A s j = 2 j> S j \ B j< Lemma 4. expresses the size of B and A 2 in terms of the new variables defined above. Lemma 4.. ALG = P j6= (s j +r j )+(z +w )+(w + w 2 ), andopt apple z +(z +w )+(w +w 2 )+ P j6= (s j + + r j ). Proof. The main idea is that most of the sets in the definitions of B and A 2 are disjoint, allowing us to represent sizes of unions of these sets by sums of sizes of individual sets. For ALG, recall that G (S j) = T j and hence, the sets G (S j) are all disjoint. Further, the sets S j are all disjoint, by construction, and disjoint with all the T j s. Thus, A + B2 = G (W ) [ G (W 2 ) + G (Z ) [ G (W ) + P j6= (s j + r j ). The sets W and W 2 are disjoint. Further, they are subsets of T (corresponding to = ), and hence nodes in these sets have a single unique neighbor in G ; consequently G (W ) [ G (W 2 ) = w + w. 2 Similarly, G (Z ) [ G (W ) = z + w. This completes the proof of the lemma for ALG. We have OPT = ALG + M. Consider any edge (u, v) 2 M. This edge is not in G and hence must go from an S j to a T j where apple j apple j or j j. The number of edges in M that go from S to T is precisely z by definition; the number of remaining edges is precisely P j6=. We now derive linear constraints on the size variables, leading to a simple linear program. We have by Lemma 3.3 that for all k> (4.) \ [ Z j [ W j AA \ E = ;, j k j \ [ japple k Z j [ japple k W j AA \ E = ;. The existence of M together with (4.)yields + j=k k j= + j=k w j, 8k >, k j= w j, 8k >. Furthermore, we have by definition of W together with (4.)that w apple w j j< j< (4.3) w 2 apple w j. j> j> (4.4) Also, we have w j = w + j< j< w j. j> = w 2 + j> Next, by Lemma 3.3, we have r j also need (/ j ). We Lemma 4.2. () G (S j \ A 2 ) \ B 2 apple(/ j ) S j \ A 2 for all j>, and(2) G (S j \ B ) \ A apple(/ j ) S j \ B for all j<. Proof. We prove (). The proof of (2) is analogous. Suppose that G (S j \ A 2 ) \ B 2 > (/ j ) S j \ A 2. Then using the assumption that (A B 2 ) \ E = ;, we get T j = T j \ B 2 + T j \ B G (S j \ A 2 ) \ B 2 + G (S j \ A ) > (/ j ) S j \ A 2 +(/ j ) S j \ A > (/ j ) S j, a contradiction to the definition of the matching skeleton. We will now bound =(OPT ALG )/OP T using a sequence of linear programs, described in figure 3. We will overload notation to use P,P2,P3,respectively, to refer to these linear programs as well as their optimum objective function value. By Lemma 4.2 one has for all j 6= that (/ j )s j w j. We combine this with equations 4.2, 4.3, and 4.4 to obtain the first of our linear programs, P, in figure 3. Bounding is equivalent to bounding this LP (i.e. apple P ). Note that we have implicitly rescaled the variables so that OPT apple. We now symmetrize the LP P by collecting the variables for cases when j is positive, negative, and to obtain LP P2 in figure 3. Finally, we relax LP P2 by combining the second and third constraints, and then establish that the remaining constraints are all tight. This gives us the LP P3 in figure 3. Details of the construction are embedded in the proof of the following lemma. Lemma 4.3. P apple P 2 apple P 3.

8 P = max z + j6= s.t. z +(z + w ) +(w + w 2 )+ j6= s j + + r j apple + 8k >, 8k >, j=k k j= 8j 6=, (/ j )s j 8j 6=,r j + j=k k w j, j= w j w j (/ j ) = w + j< j< j> = w 2 + j> z = w s, z, w, r, z,w,w,w 2 P 2 = max + j= + j= s.t. s j + + r j apple 8k, k j= w j (/ j )s j w j,j r j (/ j ),j x, z, w, r P 3 = max j= k j= s.t. ( j ++/ j ) apple j z Figure 3: The linear programs for lower bounding ALG/OP T. w j w j From P to P2 We will show that the optimum of the LP P2 in figure 3 is an upper bound for the optimum of P in figure 3. First increase the set { j } j= to ensure that j = j (this can only improve the objective function). Now, we define (4.5) s j = s j + s j,j >, r j = r j + r j,j >, z j = +,j >, w j = w j + w j,j >, w = w + w + w 2, s = w + w + w 2, z = z, r = z. We will show that if s, r, z, w, z,w,w,w 2 are feasible for P,thens,r,z,w are feasible for P2 with the same objective function value. First, the objective function is exactly the same by inspection. Constraints 3 and 4 of P2 for j > are linear in the respective variables and are hence satisfied. Furthermore, one has and (/ )s = w + w + w 2 = w r = z = z. Hence, constraints 3 and 4 are satisfied for all j. To verify that constraint is satisfied, we calculate + j= + s j + zj + rj = s + z + r + (s j + zj + rj) j= =(w + w + w 2 )+z + z + j6=(s j + + r j ) = z +(z + w )+(z + w 2 )+ j6=(s j + + r j ) apple. We now verify that constraint 2 of P2 is satisfied. First, for k = one has w = w + w + w 2 w = z = z. Next, note that by adding constraints 2,3 of P we get (4.6) w j j k for all k>. Adding constraints 6 and 7 of P, we get (4.7) j j6= = w + w 2 + j6= k w j.

9 Subtracting (4.7) from (4.6), we get (4.8) k apple w + w 2 + j = k w j. j = Adding z to both sides and using the fact that z = z and w = z + w + w 2, we get (4.9) k apple j= k w j. j= This completes the proof of the first half of lemma 4.3. From P2 to P3 We now bound P2. First we relax the constraints by adding constraint 3 of over j from to k and adding to constraint 2: max (4.) s.t. j= s j + + r j apple j= k (/ j )s j j= k, 8k j= r j (/ j ), 8j x, z, w, r Note that the first constraint is necessarily tight at the optimum. Otherwise scaling all variables to make the constraint tight increases the objective function. We now show that all of the constraints in the second line of (4.) are necessarily tight at the optimum. Indeed, let k be the smallest such that P k j= (/ j)s j > P k j=. Note that one necessarily has s k >. Let s = s e k +( k +/ k ) e k +, r = r, z = z, where e j denotes the vector of all zeros with in position j. Then k k (/ j )s j zj j= j= for all k and (s j + zj + rj)= ( k +/ k ). j= So for su ciently small positive > one has that s = s /( ( k +/ k )) r = r /( ( k +/ k )) z = z /( ( k +/ k )) form a feasible solution with a better objective function value. Thus, one has P k j= (/ j)s j = P k j= for all k and hence (/ j )s j = for all j. Additionally, one necessarily has r j = (/ j ) for all j at optimum. Indeed, otherwise decreasing r j does not violate any constraint and makes constraint slack. Then rescaling variables to restore tightness of constraint improves the objective function. Thus, we need to solve (4.) P 3 = max j= s.t. ( j ++/ j ) apple j z But P3 is easy to analyze: there exists an optimum solution that sets all to zero except for a j that minimizes ( j ++/ j ). For all non-negative x, f(x) = + x +/x is minimized when x =, and f() = 3. This gives P3 apple /3, and hence apple /3, or ALG (2/3)OPT.Thus,wehaveproved Theorem 4.. For any bipartite graph G =(P, Q, E ) there exists a subforest G of G such that for any graph G 2 = (P, Q, E 2 ) the maximum matching in G [ G 2 is a 2/3-approximation of the maximum matching in G [ G 2 ; further, it su ces to choose G to be the matching skeleton of G. Corollary 4.. CC( 3,n)=O(n). Theorem 4. also implies that the matching skeleton gives a linear size /2-cover of G. Corollary 4.2. For any bipartite graph G = (P, Q, E), the matching skeleton G is a 2-cover of G. Proof. We need to show that for any A P, B Q, A, B > n/2 such that there exists a perfect matching between A and B in G one has E \(A B) 6= ;. Let G 2 =(P [ P,Q[ Q,M P [ M Q ) be a graph that consists of a perfect matching from a new set of vertices P to Q \ B and a matching from a new set of vertices Q to P \ A. Then the maximum matching in G [ G 2 is of size (3/2)n. By the max-flow min-cut theorem, the size of the matching in G [ G 2 is no larger than P \ A + Q \ B + E \(A B). By Theorem 4. the approximation ratio is at least 2/3, and P \ A + Q \ B <n,soitmustbe that E \ (A B) >.

10 5 O(n) communication protocol for CC v ( 4,n) In this section we prove that CC v (, n) =O(n) for all </4. In particular, we show that given a bipartite graph G =(P,Q,E ), there exists a forest F E such that for any G 2 =(P 2,Q,E) that may share nodes on the Q side with G but not on the P side, the maximum matching in G [ G 2 is a 3/4-approximation of the maximum matching in G [G 2. The broad outline of the proof is similar to the previous section, but we can now assume a special optimal matching using the assumption that G 2 may only share nodes with G on the Q side. The proof uses the simple lemma below; we state it here since it is also needed in section 6. Lemma 5.. Let G =(P, Q, E) be a bipartite graph and let S P be such that (U) U for all U S. Then there exists a maximum matching in G that matches all vertices of S. The proof is quite simple: start with an arbitrary maximum matching and repeatedly find and apply even length augmenting paths originating from unmatched nodes in S and going to matched nodes in P \ S, to reduce the number of unmatched nodes in S. These paths exist by our condition on S. The details are deferred to the full version of the paper. We now state the main theorem of this section. The proof is deferred to the full version of the paper. Theorem 5.. Let G =(P,Q,E ),G 2 =(P 2,Q,E 2 ) be bipartite graphs that share the vertex set on one side. Let G be the matching skeleton of G. Then the maximum matching in G [ G 2 is a 3/4-approximation of the maximum matching in G [ G 2. 6 One-pass streaming with vertex arrivals Let G i =(P i,q,e i ) be a sequence of bipartite graphs, where P i \P j = ; for i 6= j. For a graph G, we denote by SPARSIFY (G) the matching skeleton of G modified as follows: for each pair (S j,t j ),j < keep an arbitrary matching of S j to a subset of T j, discarding all other edges, and collect all these matchings into the (S,T ) pair. Note that we have S j P,whereP is the side of the graph that arrives in the stream. We have Lemma 6.. Let G =(P, Q, E) be a bipartite graph. Let G = SP ARSIF Y (G). Let (S j,t j ),j =,...,+ denote the set of expanding pairs. Then E\(S i T j )=; for all i<j. Let (6.2) G = SPARSIFY (G ) and G i = SPARSIFY (G i [ G i ),i> We will show that for each > the maximum matching in G is at least a /e fraction of the maximum matching in S i= G i. We will slightly abuse notation by denoting the set of expanding pairs in G by (S ( ),T ( )). Recall that we have 2 (, ], and S ( ) = T ( ). We need the following Definition 6.. For a vertex u 2 P define its level after time, denoted by u ( ), as the value of such that u 2 S ( ). Similarly, for a vertex v 2 Q define its level after time, denoted by v ( ), as the value of such that u 2 T ( ). Note that for a vertex u is at level = u ( ) the expansion of the pair (S ( ),T ( )) that it belongs to is /. Before describing the formal proof, we give an outline of the main ideas. In our analysis, we track the structure of the matching skeleton maintained by the algorithm over time. For the purposes of our analysis, at each time, every vertex is characterized by two numbers: its initial level when it first appeared in the stream and its current level at time (we denote the set of such vertices at time by S, ( )). Informally, we first deduce that the matching edges that our algorithm misses may only connect a vertex in S, ( ) toavertex in T ( ) for, and hence we are interested in the distribution of vertices among the sets S, ( ). We show that vertices that initially appeared at lower levels and then migrated to higher levels are essentially the most detrimental to the approximation ratio. However, we prove that for every 2 (, ], which can be thought of as a barrier, the number of vertices that initially appeared at level < but migrated to a level S can never be larger than 2[,] T ( ) at any time. This leads to a linear program whose optimum lower bounds the approximation ratio, and yields the ( /e) approximation guarantee. Lemma 6.2. For all u 2 P and for all, u ( + ) u ( ). Similarlyforv 2 Q, v ( + ) v ( ). Proof. We prove the statement by contradiction. Let be the smallest such that 9 2 (, ] such that R := {u 2 P : u 2 S ( ), u ( + ) < u ( )} 6= ;. Let =min u2r u ( +) (we have <byassumption). Let R = R \ S ( + ). Note that R S ( ). We have (6.3) G (R ) G (R ) (/ ) R > (/) R. + Since G (S ( )) =(/) S ( ), (6.3) implies that S ( ) \ R 6= ;. However, since G (S ( ) \ R ) (/) S ( ) \ R, one has G (S ( ) \ R ) \ G (R ) 6= ;.

11 This, however, contradicts the assumption that (S ( )\ R ) \ S ( + ) = ; and the fact that G + = SPARSIFY (G,G + ). The same argument also proves the monotonicity of levels for v 2 Q. Let S, ( ) denote the set of vertices in u 2 P such that. u 2 S ( ), where is the time when u arrived (i.e. u 2 P ), and 2. u 2 S ( ). Note that one necessarily has all nonempty S,. We will need the following by Lemma 6.2 for Lemma 6.3. For all one has for all 2 (, \ [ [ ( ) A \ E t = ;. 2[,] T ( )) [ 2[,] S, t= Proof. Avertexu 2 S, ( ) with that arrived at time u could only have edges to v 2 T ( u ) for. By Lemma 6.2, such vertices v can only belong to T ( ) for some, and the conclusion follows with the help of Lemma 6.. Let t ( ) = T ( ),s, ( ) = S, ( ). The quantities t ( ),s, ( ) are defined for, 2 D = { k : < k apple / }, where / is a su ciently large integer (note that all relevant values of, are rational with denominators bounded by n). In what follows all summations over levels are assumed to be over the set D. Then Lemma 6.4. For all and for all 2 (, ], the quantities t ( ),s, ( ) satisfy (6.4) t ( ). 2[,] 2(, ] s, ( ) apple ( ) 2[,] Proof. The proof is by induction on. Base: = At = the lhs is zero, so the relation is satisfied. Inductive step:! + Fix 2 (, ). For all 2 (, ] let R ( ) =S ( ) [ S ( + ) A. 2[,] We have G (R ( )) (/ ) R ( ) and G (R ( )) S 2[,] T ( + ). Also, we have by Lemma 6.2 [ T ( ) A [ 2[,] 2(, ] [ G 2[,] (R ( )) A T ( + ). Moreover, since G (R ( )) are disjoint for dierent and disjoint from T ( ), 2 [, ], letting r ( ) = R ( ), wehave (6.5) t ( + ) 2[,] 2[,] 2[,] t ( )+ t ( )+ Furthermore, by Lemma 6.2 (6.6) = 2[,] 2(, ] 2[,] s, ( )+ Since by inductive hypothesis (6.7) t ( ) 2[,] 2(, ] 2[,] 2(, ] 2(, ] s, ( + ) 2(, ] 2(, ] r ( ) r ( ) r ( ). s, ( ). we have by combining (6.5), (6.6) and (6.7) 2[,] 2[,] + = t ( + ) t ( )+ 2[,] 2[,] 2[,] 2(, ] 2(, ] 2(, ] 2(, ] s, ( ) r ( ) (s, ( + ) s, ( )) s, ( + ). In what follows we only consider sets S, ( ),T ( ) for fixed, and omit for brevity. Let S = S, S,. Choose a maximum matching M in G that matches all of S, as guaranteed by Lemma 5.. Let denote

12 the number of vertices in T that are matched outside of S by M (note that no vertices of T, 2 (, ) are matched outside of S by lemma 6.3). For each 2 (, ] let r apple t denote the number of vertices in T that are not matched by M. Then the following is immediate from lemma 6.3. Lemma 6.5. For all apple (6.8) 2[,] t 2[,], 2[,] Proof. Follows from Lemma 6.3. We also have (6.9) 2[,] s, + 2(,] s, = 2[,] 2[,] t r +. for all 2 (, ]. By Lemma 6.4 and Lemma 6.5, we get ALG = (t r )+(t r ) 2(,) OPT = ALG + t + r. Thus, we need to minimize ALG/OP T subject to t r +,t,s, and (6.2) 8 2 (, ] : 8 2 (, ] : 8 2 (, ] t + 2[,] 2[,] 2[,] + s, A 2[,] s, apple ( ) 2(, ] 2[,] t. 2(,] s, = 2[,] We start by simplifying (6.2). First note that we can assume without loss of generality that r =. Indeed, if r >, we can decrease r to and increase to keep ALG constant, without violating any constraints, only increasing OPT. Furthermore, we have wlog that t > since otherwise ALG/OP T =. Finally, note that setting t = only makes the ratio ALG/OP P T smaller, so it is su cient to lower bound 2(,) (t r ) in terms of, and for this purpose we can set = since this only fixes the scaling of all variables. Thus, it is su cient to lower bound the P optimum of (6.2), obtaining a lower bound of P + on the ratio ALG/OP T. Combining constraints 2 and 3 of (6.2), we get ( + )t + = = t. t (6.2) P =min s.t. 8 2 (, ] : 8 2 (, ] : 8 2 (, ] 2(,) (t r ) t + 2[,] 2[,] 2[,] Thus, it is su (6.22) P 2 =min s.t. 2[,] t,s,. 2(,) 8 2 (, ] : + s, A 2[,] s, apple ( ) 2(, ] 2[,] t 2(,] s, = 2[,] cient to lower bound the optimum of (t r ) ( + )t + 2[,) r. We first show that one has r = for all 2 [, ) at the optimum. Indeed, suppose that r > for some 2 (, ). Then since the coe cient of t is ( + ) apple <, =, we can decrease r by some > and also decrease t by <, keeping all constraints satisfied and improving the value of the objective function. Thus, we arrive at the final LP, whose optimum we need to lower bound: (6.23) P 3 =min s.t. 2(,) t 8 2 (, ] : ( + )t. t. We now show that all constraints are necessarily tight at the optimum. Let 2 [, ] be the largest such that constraint is not tight. Note that one necessarily t

13 has t >. Let t = t e + + e. We now verify that all constrains are satisfied. For > all constraints are satisfied since we did not change t. For =, the constraint is satisfied since it was slack for t and is su ciently small. For <,i.e. apple since we are considering only 2 D, wehave + )t ( = ( + )t ( ) + + ( + ) + = ( ) + )t + + ( ( + )t. Thus, at the optimum we have (6.24) ( + ( ))t =, 8 2 [, ]. Subtracting (6.24) for + from (6.24) for, we get ( + ( ))t (6.25) In other words, + ( + ( + )t = t t =. (6.26) t = t,t. Let =. We now prove by induction that t k = ( + ) k for all k>. Base: k = t = =. Inductive step: k! k + t (k+) = Thus, t (k+) = = (k+) ( + )k ( + ) k ( + ) j A j= k ( + ) j A j= = ( + ) k. Hence, one has 2[,) t / j= ( + ) j = =(+ ) / = ( + )/ ( + ) + / =( ) / Now, the size of the matching M is bounded by OPT apple t +. On the other hand, Thus, we get ALG 2[,) 2[,) t. ALG OPT = P P + = P + P 3 + ( ) / /e since ( ) / apple /e for all. We now prove Theorem 6.. There exists a deterministic O(n) space -pass streaming algorithm for approximating the maximum matching in bipartite graphs to factor /e in the vertex arrival model. Proof. Run the algorithm given in (6.2), letting P i =, i.e. sparsifying as soon as a new vertex comes in. The algorithm only keeps a sparsifier G i in memory, which takes space O(n). 7 Constructions of Ruzsa-Szemerédi graphs In this section we give two extensions of constructions of Ruzsa-Szemerédi graphs from [7]. The first construction shows that for any constant >thereexist(/2 )- Ruzsa-Szemerédi graphs with superlinear number of edges. We use this construction in section 8 to prove that our bound on CC(, n), < /3 is tight. The second construction that we present is a generalization to lop-sided graphs, which we use in section 8 to prove that our bound on CC v (, n), < /4 is tight. Specifically, we show the following results: Lemma 7.. For any constant > there exists a family of bipartite (/2 )-Ruzsa-Szemerédi graphs with n + (/ log log n) edges. Lemma 7.2. For any constant > there exists a family of bipartite Ruzsa-Szemerédi graphs G =(, Y, E)

14 with = n, Y = 2n such that () the edge set (/ log log n) E is a union of n induced 2-matchings M,...,M k of size at least (/2 O( )), and (2) for any j 2 [ : k] the graph G contains a matching Mj of size at least ( O( )) that avoids Y \ (M j \ Y ). The proofs of these results are based on an adaptation of Theorem 6 in [7] (see also [5]), which constructs bipartite /3-Ruzsa-Szemerédi graphs with superlinear number of edges. The main idea of the construction, use of a large family of nearly orthogonal vectors derived from known families of error correcting codes, is the same. A technical step is required to go from matchings of size /3 to matchings of size /2 for any >. Since the result does not follow directly from [7], we give a complete proof in the full version. 8 Lower bounds on communication and one-pass streaming complexity We show here that lower bounds on the size of Ruzsa- Szemerédi graphs yield lower bounds on the (randomized) communication complexity, and hence for one-pass streaming complexity. In the edge model, we show that CC 2( ) 2, (2 )n = (U I (, n)) for all, >. In particular, combined with the constructions of (/2 + )-Ruzsa-Szemerédi graphs for any constant > (Lemma 7.) this proves that CC(, n) = n + (/ log log n) for < /3. Thus our O(n) upper bound on CC( 3,n) in section 4 is optimal in the sense that any better approximation requires super-linear communication. As a corollary, we also get that super-linear space is necessary to achieve better than 2/3-approximation in the one-pass streaming model. In the vertex model, using the construction of Ruzsa-Szemerédi graphs from Lemma 7.2, we show that CC v (, n) = n + (/ log log n) for all < /4. This proves optimality of our construction in section 5, and also shows that super-linear space is necessary to achieve better than 3/4-approximation in the one-pass streaming model even in the vertex arrival setting. We note that our lower bounds for both the edge and vertex arrival case apply to randomized algorithms. The proofs of these results appear in the full version. 8. Edge arrivals Lemma 8.. For any > and >, CC 2( ) 2, (2 )n = (U I (, n)). Proof. For any >, we will construct a distribution over bipartite graphs with (2 )n vertices on each side such that each graph in the distribution contains a matching of size at least (2 )n n. On the other hand, we will define a partition of the edge set E of the graph into E = E [ E 2 and show that any for deterministic communication protocol using message size s = o(u I (, n)), the expected size of the matching computed is bounded by 2( )n + o(n). Using Yao s minmax principle, we get the desired performance bound for any protocol with o(u I (, n)) communication. Let G =(P, Q, E) be an -RS graph with n vertices on each side and U I (, n) edges. By definition, E can be partitioned into k induced matchings M,...,M k,where M i = n for apple i apple k, and k = U I (, n)/( n). We generate a random bipartite graph G =(P [ P 2,Q [ Q 2,E [ E 2 ) with (2 follows: )n vertices on each side, as. We set P = P and Q = Q. Also, let P 2 and Q 2 be a set of ( )n vertices each that are disjoint from P and Q. 2. For each M i, i =,...,k,letmi be a uniformly at random chosen subset of M i of size ( )n. We set E = [ k i= M i. 3. Choose a uniformly random r 2 [ : k]. Let M be an arbitrary perfect matching between P 2 and Q \ Q (M r ), and let M2 be an arbitrary perfect matching between Q 2 and P \ P (M r ). We set E 2 = M [ M2. The instance G is partitioned between Alice and Bob as follows: Alice is given all edges in G (P,Q,E ) (first phase), and Bob is given all edges in G 2 (P 2,Q 2,E 2 ) (second phase). Clearly, any optimal matching in G has size at least (2 )n n; consider, for instance, the matching Mr [ M [ M2. We now show that for any deterministic communication protocol using communication at most s = o(u I (, n)), with probability at least ( o()), number of edges in Mr retained by the algorithm at the end of the first phase is o(n). Assuming this claim, we get that with probability at least ( o()), the size of the matching output by Bob is bounded by 2( )n+o(n). Hence the expected size of the matching output by Bob is bounded by 2( )n + o(n). We now establish the preceding claim. We start by observing that the number of distinct first phase graphs is at least (assume < /2) k n = n n U I (,n) n =2 U I (,n), n for some positive bounded away from. Let G denote the set of all possible first phase graphs, and let

15 : G!{, } s be the mapping used by Alice to map graphs in G to a message of size s = o(u I (, n)). For any graph H 2G, let (H) ={H (H )= (H)}. Then note that for any graph H 2G, Bob can output an edge e in the solution i e occurs in every graph H 2 (H). For any subset F of G, letg F denote the unique graph obtained by intersection of all graphs in F (i.e. the graph G F contains an edge e i e is present in every graph in the family F ). Claim 8.. For any < < 2 and any subset F of G, let I {, 2,...,k} be the set of indices such that G F contains at least n edges from M i for each i 2 I. Then if F 2 ( o())u I (,n), I = o(k). The details of the proof are deferred to the full version of the paper. To conclude the proof, we note that a simple counting argument shows that for a uniformly at random chosen graph H 2G, with probability at least o(), we have (H) 2 ( o())u I (,n). Conditioned on this event, it follows from claim 8. that for a randomly chosen index r 2 [..k], with probability at least o(), the graph G (H) contains at most n edges from M r. In particular, we get Corollary 8.. For any >, CC(2/3 +, n) = n + (/ log log n). Proof. Follows by putting together Lemma 7. and Lemma 8.. Lower bounds on communication complexity translate directly into bounds on one-pass streaming complexity: Corollary 8.2. For any constant > any (possibly randomized) one-pass streaming algorithm that achieves 2( ) approximation factor 2 + must use (U I (, n)) space. In particular, any one-pass streaming algorithm that achieves approximation factor 2/3 + must use n + (/ log log n) space. Proof. Follows by Lemma 7. and Lemma Vertex arrivals We now prove a lower bound on the communication complexity in the vertex arrival model using the construction of lop-sided Ruzsa- Szemerédi graphs from Lemma 7.2. The bound implies that our upper bound from section 5 is tight. Moreover, the bound yields the first lower bound on the streaming complexity in the vertex arrival model. Lemma 8.2. For any constant >, CC v(3/4+, n) = n + (/ log log n). Proof. For su ciently small >, we will construct a distribution over bipartite graphs with (2 + )n vertices on each side such that each graph in the distribution contains a matching of size at least (2 O( ))n. On the other hand, we will show that for any deterministic protocol using space s = n +o(/ log log n),theexpected size of the matching computed is bounded by (3/2 + O( ))n + o(n). Using Yao s minmax principle we get the desired performance bound for any n +o(/ log log n) - space randomized protocol. Let G = (P, Q, E) be an (/2 )-RS graph + (/ log log n) with P = n, Q = 2n and n edges, as guaranteed by Lemma 7.2. By definition, E can be partitioned into k induced 2-matchings M,...,M k, where M i (/2 )n for apple i apple k, and k = n (/ log log n) and some = O( ). We generate a random bipartite graph G =(P [ P 2,Q,E [ E 2 )with (2 + )n vertices on each side, as follows:. We set P = P and let P 2 be a set of ( + )n vertices that are disjoint from P. 2. For each M i, i =,...,k,letmi be a uniformly at random chosen subset of M i of size (/2 2 )n. We set E = [ k i= M i. 3. Choose a uniformly random r 2 [ : k]. Let M be an arbitrary perfect matching between P 2 and Q \ Q(M r ). We set E 2 = M. Let Alice hold the graph G A (P,Q,E ) and let Bob hold the graph G 2 =(P 2,Q,E 2 ). By Lemma 7.2, there exists a matching Mr that matches at least a ( ) fraction of and avoids Q \ Q(M r ). Thus, any optimal matching in G A [ G B has size at least (2 O( ))n; consider, for instance, the matching Mr [ M. However, no deterministic space protocol can output more than a = O( ) fraction of the edges in Mr if it uses n +o (/ log log n) space by the same argument as in 8.. Hence, the size of the matching output by the protocol is bounded above by (/2+O( )) P + P 2 = (3/2+O( ))n. We immediately get Corollary 8.3. For any constant > any (possibly randomized) one-pass streaming algorithm that achieves + (/ log log n) approximation factor 3/4+ must use n space. 9 Matching covers & Ruzsa-Szemerédi graphs In this section we prove that the size of the smallest possible matching cover is essentially the same as the number of edges in the largest Ruzsa-Szemerédi graph with appropriate parameters.

Machine Minimization for Scheduling Jobs with Interval Constraints

Machine Minimization for Scheduling Jobs with Interval Constraints Julia Chuzhoy Sudipto Guha Sanjeev Khanna Joseph (Seffi) Naor Abstract The problem of scheduling jobs with interval constraints is a well-studied