Minimum-cost matching in a random bipartite graph with random costs

Minimum-cost matching in a random bipartite graph with random costs Alan Frieze Tony Johansson Department of Mathematical Sciences Carnegie Mellon University Pittsburgh PA 523 USA June 2, 205 Abstract Let G G n,n,p be the random bipartite graph on n+n vertices, where each e [n] 2 appears as an edge independently with probability p Suppose that each edge e is given an independent uniform exponential rate one cost Let CG denote the expected length of the minimum cost perfect matching We show that whp if d log n 2 then E [CG] + o π2 6p This generalises the well-known result for the case G K n,n Introduction There are many results concerning the optimal value of combinatorial optimization problems with random costs Sometimes the costs are associated with n points generated uniformly at random in the unit square [0, ] 2 In which case the most celebrated result is due to Beardwood, Halton and Hattersley [3] who showed that the minimum length of a tour through the points as grew as βn /2 for some still unknown β For more on this and related topics see Steele [8] The optimisation problem in [3] is defined by the distances between the points So, it is defined by a random matrix where the entries are highly correlated There have been many examples considered where the matrix of costs contains independent entries Aside from the Travelling Salesperson Problem, the most studied problems in combinatorial optimization are perhaps, the shortest path problem; the minimum spanning tree problem and the matching problem As a first example, consider the shortest path problem in the complete graph K n where the edge lengths are independent exponential random variables with rate We denote the exponential random variable with rate λ by Eλ Thus PrEλ x e λx for x R Janson [9] proved among other things that if X i,j denotes the shortest distance between vertices i, j in this model then E [X,2 ] Hn n where H n n i i Research supported in part by NSF Grant DMS362785 Email: alan@randommathcmuedu Research supported in part by NSF Grant DMS362785 Email: tjohanss@andrewcmuedu

As far as the spanning tree problem is concerned, the first relevant result is due to Frieze [6] He showed that if the edges of the complete graph are given independent uniform [0, ] edge weights, then the random minimum length of a spanning tree L n satisfies E [L n ] ζ3 i i 3 as n Further results on this question can be found in Janson [8], Beveridge, Frieze and McDiarmid [4], Frieze, Ruszinko and Thoma [7] and Cooper, Frieze, Ince, Janson and Spencer [5] In the case of matchings, the nicest results concern the the minimum cost of a matching in a randomly edge-weighted copy of the complete bipartite graph K n,n If C n denotes the random minimum cost of a perfect matching when edges are given independent exponential E random variables then the story begins with Walkup [9] who proved that E [C n ] 3 Later Karp [0] proved that E [C n ] 2 Aldous [], [2] proved that lim n E [C n ] ζ2 k Parisi k 2 [3] conjectured that in fact E [C n ] n k This was proved independently by Linusson and k 2 Wästlund [] and by Nair, Prabhakar and Sharma [2] A short elegant proof was given by Wästlund [6], [7] In the paper [4] on the minimum spanning tree problem, the complete graph was replaced by a d-regular graph G Under some mild expansion assumptions, it was shown that if d then ζ3 can be replaced asymptotically by n d ζ3 Consider a d-regular bipartite graph G on 2N vertices Here d dn as N Each edge e is assigned a cost W e, each independently chosen according to the exponential distribution E Denote the total cost of the minimum-cost perfect matching by CG We conjecture the following under some possibly mild restrictions: Conjecture Suppose d dn as N For any d-regular bipartite G, Here the o term goes to zero as N E [CG] + o N d π 2 6 In this paper we prove the conjecture for random bipartite graphs Let G G n,n,p be the random bipartite graph on n + n vertices, where each e [n] 2 appears as an edge independently with probability p Suppose that each edge e is given an independent uniform exponential rate one cost Theorem If d ωlog n 2 where ω then whp E [CG] π2 6p Here A n B n iff A n + ob n as n and the event E n occurs with high probability whp if PrE n o as n Applying results of Talagrand [4] we can prove the following concentration result Theorem 2 Let ε > 0 be fixed, then Pr CG π2 6p ε n K, p for any constant K > 0 2

2 Proof of Theorem We find that the proofs in [6], [7] can be adapted to our current situation Suppose that the vertices of G consist of A {a i, i [n]} and B {b j, j [n]} Let Cn, r denote the expected cost of the minimum cost matching We will prove that whp M r {a i, φ r a i : i, 2,, r} of A r {a, a 2,, a r } into B Cn, r Cn, r p r i0 rn i for r, 2,, n m where m n ω /2 log n Using this we argue that E [CG] Cn, n Cn, n Cn, n m + + + o p We will then show that r r i0 Theorem follows from these two statements Let B r {φ r a i : i, 2,, r} Lemma B r is a random r-subset of B rn i k r r i0 rn i 2 k 2 π2 6 3 Cn, n Cn, n m + op 4 Proof Let L denote the n n matrix of edge costs, where Li, j W a i, b j and Li, j if edge a i, b j does not exist in G For a permutation π of B let L π be defined by L π i, j Li, πj Let X, Y be two distinct r-subsets of B and let π be any permutation of B that takes X into Y Then we have PrB r L X PrB r L π πx PrB r L π Y PrB r L Y, where the last equality follows from the fact that L and L π have the same distribution We use the above lemma to bound degrees For v A let d r v {w B \ B r : v, w EG} Then we have the following lemma: Lemma 2 d r v n rp ω /5 n rp whp for v A, 0 r n m 3

Proof This follows from Lemma ie B \ B r is a random set and the Chernoff bounds viz Pr d r v n rp ω /5 n rp 2e ω 2/5 n rp/3 2n ω/0 We can now use the ideas of [6], [7] We add a special vertex b n+ to B, with edges to all n vertices of A Each edge adjacent to b n+ is assigned an Eλ cost independently, λ > 0 We now consider M r to be a minimum cost matching of A r into B B {b n+ } We denote this matching by M r and we let B r denote the corresponding set of vertices of B that are covered by M r Lemma 3 Suppose r < n m Then Prb n+ B r b n+ / B r λ pn r + + ε r + λ 5 where ε r ω /5 Proof Assume that b n+ / Br M r is obtained from Mr by finding an augmenting path P a r,, a σ, b τ from a r to B \ Br of minimum additional cost Let α W σ, τ We condition on i σ, ii the lengths of all edges other than a σ, b j, b j B \ Br and iii min { Aσ, j : b j B \ Br } α With this conditioning Mr Mr will be fixed and so will P a r,, a σ We can now use the following fact: Let X, X 2,, X M be independent exponential random variables of rates α, α 2,, α M Then the probability that X i is the smallest of them is α i /α + α 2 + + α M Furthermore, the probability stays the same if we condition on the value of min {X, X 2,, X M } Thus Prb n+ B r b n+ / B r λ d r a σ + λ Corollary Prb n+ B r p as λ 0, where ε r ω /5 n + n + + + ε k λ + Oλ 2 6 n r + Proof Let νj p n j + ε j, ε j ω /5 Then the probability is given by ν0 ν0 + λ ν ν + λ νr νr + λ + λ λ + ν0 νr ν0 + ν + + νr p λ + Oλ 2 n + ε 0 + n + ε + + n r + + ε r and each error factor satisfies / + ε j ω /5 λ + Oλ 2 4

Lemma 4 If r n m then E [Cn, r Cn, r ] r + ε k rp n i i0 7 where ε k ω /5 Proof Let X be the cost of M r and let Y be the cost of M r Let w denote the cost of the edge a r, b n+, and let I denote the indicator variable for the event that the cost of the cheapest A r -assignment that contains this edge is smaller than the cost of the cheapest A r -assignment that does not use b n+ In other words, I is the indicator variable for the event {Y + w < X} It follows from Corollary and symmetry to obtain the factor /r that the probability that a r, b n+ Mr is given by rp n + n + + + ε k λ + Oλ 2 8 n r + as λ 0, since each edge adjacent to b n+ is equally likely to participate in M r If a r, b n+ M r then w < X Y Conversely, if w < X Y and no other edge from b n+ has cost smaller than X Y, then a r, b n+ M r, and when λ 0, the probability that there are two distinct edges from b n+ of cost smaller than X Y is of order Oλ 2 On the other hand, w is Eλ distributed, so E [I] Pr {w < X Y } E [ e λx Y ] [ E e λx Y ] 9 Hence E [I], regarded as a function of λ, is essentially the Laplace transform of X Y In particular E [X Y ] is the derivative of E [I] evaluated at λ 0, so E [X Y ] d dλ E [I] λ0 rp n + n + + + ε k 0 n r + where ε k ω /5 Now clearly, as λ 0, E [X] Cn, r and E [Y ] Cn, r and the lemma follows This confirms 2 and we turn to 3 We use the following expression from Young [20] n i Let m ω /4 m Observe first that m i0 n i ri+ o + 2 n i log n + γ + 2n + On 2, where γ is Euler s constant r o + m in 3/4 m in 3/4 log n i n i + ri+ r 2n m + On 3/2 o + 2 n m n log m! o + 2m ne n log o 2 m 5

Then, r r i0 rn i i0 n i n i im n i im ri+ ri+ log n i log im jm+ xm+ j log x log r, r + o, n m i n m i n m n j n m n x + 2n m 2i + Oi 2 + o, + o, + o, 3 dx + o We can replace the sum in by an integral because the terms are all o and the sequence of summands is unimodal Continuing, we have Observe next that m So, y 0 xm+ n m x log n x xm+ k xm+ m y x log y + m kn m k dy klog n If k log n then we write Now m y y + m kn m k dy m y y k y k xm+ dx x m n m x m k x kn m k dx dx dy 4 y + m kn m k y k m y0 x m k x kn m k dx m y y k kn m k dy k 2 klog n y + m k m kn m k dy + y y + m k kn m k dy n m k m + k k 2 n m k k 2 + O 6 o 5 k2 y k y + m k y + mkn m k dy kω /4 log n 6

If k then m y And if 2 k log n then It follows that 0 m y log n k y + m k y k y + mkn m k dy m logn m o n m y + m k y k k y + mkn m k dy m y m l y m k l y0 equation 3 now follows from 4, 5, 6 and 7 Turning to 4 we prove the following lemma: k l y k l m l y + mkn m k dy k y k l m l l kn m k dy k k m l n m m k l l kk ln m k l km O O kk n kω /2 log n y + m k y k log n dy o + O y + mkn m k kω /2 o 7 log n k2 Lemma 5 If r n m then 0 Cn, r + Cn, r O This will prove that m log n 0 Cn, n Cn m + O and complete the proof of 4 and hence Theorem log n n O ω /2 o p 2 Proof of Lemma 5 Let we denote the weight of edge e in G Let V r A r+ B and let G r be the subgraph of G induced by V r For a vertex v V r order the neighbors u, u 2,, of v in G r so that wv, u i wv, u i+ Define the k-neighborhood N k v {u, u 2,, u k } Let the k-neighborhood of a set be the union of the k-neighborhoods of its vertices In particular, for S A r+, T B, N k S {b B : a S : y N k a}, 8 N k T {a A r+ : b T : a N k b} 9 Given a function φ defining a matching M of A r into B, we define the following digraph: let Γr V r, X where X is an orientation of X {{a, b} G : a A r+, b N 40 a} {{a, b} G : b B, a N 40 b} {φa i, a i : i, 2,, r} 7

An edge e M is oriented from B to A and has weight w r e we The remaining edges are oriented from A to B and have weight equal to their weight in G The arcs of directed paths in Γ r are alternately forwards A B and backwards B A and so they correspond to alternating paths with respect to the matching M It helps to know Lemma 6, next that given a A r+, b B we can find an alternating path from a to b with Olog n edges The ab-diameter will be the maximum over a A r+, b B of the length of a shortest path from a to b Lemma 6 Whp, for every φ, the unweighted ab-diameter of Γ r is at most k 0 3 log 4 n Proof For S A r+, T B, let NS {b B : a S such that a, b X}, NT {a A r+ : b T such that a, b X} We first prove an expansion property: that whp, for all S A r+ with S n/5, NS 4 S Note that NS, NT involve edges oriented from A to B and so do not depend on φ Pr S : S n/5, NS < 4 S o + n/5 s n/5 s n/5 ne s s 4s r + n 40 s 4s n 40 s ne 4s e 5 4 36 s 35 n 35 4s 4s 40s n s s o 20 Explanation: The o term accounts for the probability that each vertex has at least 40 neighbors in Γ r Condition on this Over all possible ways of choosing s vertices and 4s targets, we take the probability that for each of the s vertices, all 40 out-edges fall among the 4s out of the n possibilities Similarly, whp, for all T B with T n/5, NT 4 T Thus by the union bound, whp both these events hold In the remainder of this proof we assume that we are in this good case, in which all small sets S and T have large vertex expansion Now, choose an arbitrary a A r+, and define S 0, S, S 2, as the endpoints of all alternating paths starting from a and of lengths 0, 2, 4, That is, S 0 {a} and S i φ NS i Since we are in the good case, S i 4 S i provided S i n/5, and so there exists a smallest index i S such that S is > n/5, and i S log 4 n/5 log 4 n Arbitrarily discard vertices from S is to create a smaller set S i S with S i S n/5, so that S i S NS i S has cardinality S i S 4 S i S 4n/5 Similarly, for an arbitrary b B, define T 0, T,, by T 0 {b} and T i φnt i 8

Again, we will find an index i T log 4 n whose modified set has cardinality T i T 4n/5 With both S i S and T i T larger than n/2, there must be some a S i S for which b φa T i T This establishes the existence of an alternating walk and hence removing any cycles an alternating path of length at most 2i S + i T 2 log 4 n from a to b in Γ r We will need the following lemma, Lemma 7 Suppose that k +k 2 + +k M a log N, and X, X 2,, X M are independent random variables with Y i distributed as the k i th minimum of N independent exponential rate one random variables If µ > then Pr X + + X M µa log N N a+log µ µ N a log N Proof Let Y k denote the kth smallest of Y, Y 2,, Y N, where we assume that k Olog N Then the density function f k x of Y k is N f k x k e x k e xn k+ dx k and hence the ith moment of Y k is given by [ ] N E Yk i kx i e x k e xn k+ dx 0 k N kx i+k e xn k+ dx 0 k N i + k! k k N k + i+k k 2 kk + i + k + O N N k + i Thus, if 0 t < N k +, E [ e ty ] k k 2 + O N i0 t N k + i k + O i If Z X + X 2 + + X M then if 0 t < N a log N, It follows that Pr Z E [ e tz] M E [ e tx ] i i µa log N N a log N k 2 N t a log N N a log N t a log N { exp N a log N We put t N a log N /µ to minimise the above expression, giving µa log N Pr Z µe µ a log N N a log N t k N k + tµa log N N a log N } 9

Lemma 8 Whp, for all φ, the weighted ab-diameter of log n Γ r is at most c for some absolute contant c > 0 Proof Let { k } k Z max wa i, b i wb i, a i+, 2 i0 where the maximum is over sequences a 0, b 0, a,, a k, b k where a i, b i is one of the 40 shortest arcs leaving a i for i 0,,, k k 0 3 log 4 n, and b i, a i+ is a backwards matching edge We compute an upper bound on the probability that Z is large For any ζ > 0 we have Pr Z ζ ln n where i0 k 0 + o k+ o + r + n k+ k0 y ln n k k! y0 ρ 0 +ρ + +ρ k 40k+ qρ 0, ρ,, ρ k ; η Pr X 0 + X + + X k η log n, qρ 0, ρ,, ρ k ; ζ + y dy X 0, X,, X k are independent and X j is distributed as the ρ j th minimum of r independent exponential random variables When k 0 there is no term k! y log n n k Explanation: The o term is for the probability that there is a vertex in V r that has fewer than o neighbors in V r We have at most r + n k+ choices for the sequence k a 0, b 0, a,, a k, b k The term y ln n k! dy bounds the probability that the sum of k independent exponentials, wb 0, a + + wb k, a k, is in ln n [y, y + dy] The density function for the sum of k independent exponentials is xk e x k! We integrate over y +o is the probability that a i, b i is the ρ i th shortest edge leaving a i, and these events are independent for 0 i k The final summation bounds the probability that the associated edge lengths sum to at least ζ+y ln n It follows from Lemma 7 that if ζ is sufficiently large then, for all y 0, qρ,, ρ k ; ζ + y ζ+y log n/2 log n ζ+y/2 Since the number of choices for ρ 0, ρ,, ρ k is at most 4k+40 k+ the number of positive integral solutions to a 0 + a + + a k+ 40k + we have Pr Z ζ ln n k 0 2n 2 ζ/2 ln n k 4k + 40 y k n y/2 dy k! k + k0 y0 k 0 2n 2 ζ/2 ln n k 84e k Γk k! ln n k0 2n 2 ζ/2 k 0 + 84e k 0+2 on 4 0

Lemma 8 shows that with probability on 2 in going from M r to M r+ we can find an augmenting path of weight at most c log n This completes the proof of Lemma 5 and Theorem Note that to go from whp to expectation we use the fact that whp we Olog n for all e A B, Notice that in the proof of Lemmas 6 and 8 we can certainly make the failure probability less than n anyconstant 3 Proof of Theorem 2 The proof of Lemma 8 allows us to claim that with probability On anyconstant the maximum log n length of an edge in the minimum cost perfect matching of G is at most µ c 2 for some constant c 2 > 0 We can now proceed as in Talagrand s proof of concentration for the assignment problem We let ŵe min {we, µ} and let ĈG be the assignment cost using ŵ in place of w We observe that PrĈG CG On anyconstant 22 and so it is enough to prove concentration of ĈG For this we use the following result of Talagrand [4]: consider a family F of N-tuples α α i i N of non-negative real numbers Let Z min α i X i α F where X, X 2,, X N are an independent sequence of random variables taking values in [0, ] Let σ max α F α 2 Then if M is the median of Z and u > 0, we have i N } Pr Z M u 4 exp { u2 4σ 2 23 We apply 23 with N n 2 and X e ŵe/µ For F we take the n! {0, } vectors corresponding to perfect matchings and scale them by µ In this way, e α ex e will be the weight of a perfect matching In this case we have σ 2 nµ 2 Applying 23 we obtain Pr ĈG M ε } { 4 exp { ε2 p 4p 2 nµ 2 exp ε2 n c 2 log n 2 where M is the median of ĈG Theorem 2 follows easily from 22 and 24 }, 24 4 Final remarks We have generalised the result of [2] to random bipartite graphs It would seem that in the absence of proving Conjecture we should be able to replace ωlog n 2 in Theorem by ω log n It would be of interest to prove the analogous result for G n,p Here we would expect to find that the expected cost of a minimum matching was asymptotically π2 2p, given that Wästlund has proved this for p in [5]

References [] D Aldous, Asymptotics in the random assignment problem, Probability Theory and Related Fields 93 992 507-534 [2] D Aldous, The ζ2 limit in the random assignment problem, Random Structures and Algorithms 4 200 38-48 [3] J Beardwood, J H Halton and J M Hammersley, The shortest path through many points, Mathematical Proceedings of the Cambridge Philosophical Society 55 959 299-327 [4] A Beveridge, AM Frieze and CMcDiarmid, Random minimum length spanning trees in regular graphs, Combinatorica 8 998 3-333 [5] C Cooper, AM Frieze, N Ince, S Janson and J Spencer, On the length of a random minimum spanning tree, see arxivorg [6] AM Frieze, On the value of a random minimum spanning tree problem, Discrete Applied Mathematics 0 985 47-56 [7] AM Frieze, M Ruszinko and L Thoma, A note on random minimum length spanning trees, Electronic Journal of Combinatorics 7 2000 R4 [8] S Janson, The minimal spanning tree in a complete graph and a functional limit theorem for trees in a random graph, Random Structures Algorithms 7 995 337-355 [9] S Janson, One, two and three times log n/n for paths in a complete graph with random weights, Combinatorics, Probability and Computing 8 999 347-36 [0] RM Karp, An upper bound on the expected cost of an optimal assignment, Discrete Algorithms and Complexity: Proceedings of the Japan-US Joint Seminar D Johnson et al, eds, Academic Press, New York, 987, -4 [] S Linusson and J Wästlund, A proof of Parisi s conjecture on the random assignment problem, Probability Theory and Related Fields 28 2004 49-440 [2] C Nair, B Prabhakar and M Sharma, Proofs of the Parisi and Coppersmith-Sorkin random assignment conjectures, Random Structures and Algorithms 27 2005 43-444 [3] G Parisi, A conjecture on Random Bipartite Matching, Pysics e-print archive 998 [4] M Talagrand, Concentration of measure and isoperimetric inequalities in product spaces, Publications Mathmatiques de L IHÉS 995 [5] J Wästlund, Random matching problems on the complete graph, Electronic Communications in Probability 2008 258-265 [6] J Wästlund, A simple proof of the Parisi and Coppersmith-Sorkin formulas for the random assignment problem, Linköping Studies in Mathematics 6 2005 [7] J Wästlund, An easy proof of the ζ2 limit in the random assignment problem, Elec Comm in Probab 4 2009 26 269 2

[8] J Michael Steele, Probability Theory and Combinatorial Optimization, SIAM CBMS series, 996 [9] DW Walkup, On the expected value of a random asignment problem, SIAM Journal on Computing 8 979 440-442 [20] RM Young, Euler s constant, Mathematical Gazette 75 99 87-90 3