Lecture 8: The Goemans-Williamson MAXCUT algorithm

IU Summer School Lecture 8: The Goemans-Williamson MAXCUT algorithm Lecturer: Igor Gorodezky The Goemans-Williamson algorithm is an approximation algorithm for MAX-CUT based on semidefinite programming. In these two lectures we will present the algorithm, analyze it, and determine the integrality gap of the underlying semidefinite program. Introduction Consider a graph G = (V, E, w) with edge weights 0. Given S V, let the value of the cut induced by S be the quantity i S,j / S. MAX-CUT is the problem of finding S V that induces a cut of maximum value. We will write opt(g) for the maximum value of a cut in G. MAX-CUT is NP-hard (the decision version was one of Karp s 1 NP-complete problems), so it is unlikely that it can be solved exactly in polynomial time. Therefore, research on MAX-CUT has mostly revolved around approximation algorithms for the problem; an algorithm is a c-approximation for MAX-CUT, where c 1, if it is guaranteed to produce a cut whose value is at least c opt(g). We call c the approximation ratio. In this lecture we will present randomized versions of approximation algorithms, which are often easier to analyze than their deterministic counterparts. We say that a randomized algorithm is a c-approximation for MAX-CUT if the expected value of the cut it outputs is at least c opt(g). There is a simple randomized algorithm for MAX-CUT that produces a cut whose expected value is at least 1 opt(g): randomly sample S by including every vertex with probability 1/. The probability that an edge (i, j) is in the cut induced by S is the probability that i S, j / S or vice-versa, which is exactly 1/. Therefore E = Pr[(i, j) in cut] = 1 1 opt(g). i,j ij i S,j / S This algorithm can be derandomized (using the method of conditional expectation) to yield a deterministic 1/-approximation algorithm. 1 The 80s and early 90s saw the publication of several algorithms for MAX-CUT with approximation ratio 1 + O(1/p) where p is some 1 In 1976, Sahni & Gonzalez gave a greedy deterministic 1/-approximation algorithm for MAX-CUT that is equivalent to this derandomization. 8-1

cos 1 x Π 1 x Figure 1: cos 1 (x)/π versus (1 x)/; the ratio of the former to the latter is minimized at x.689. graph parameter (such as V, E, maximum degree, etc.), but an approximation ratio universally better than 1/ was not achieved until the breakthrough paper [GW] of Goemans & Williamson in 1994. In that paper, the authors give an algorithm based on semidefinite programming (which we describe below) with approximation ratio α = cos 1 (x) min 1 x 1 π 1 x.87856. (1) Indeed, the MAX-CUT algorithm in [GW] was the first of many fruitful applications of semidefinite programming to approximation algorithms; for many important optimization problems, the algorithm that achieves the best known approximation ratio is SDP-based. 1 The algorithm In this section we describe and analyze the Goemans-Williamson (GW) algorithm. Our treatment follows that in the forthcoming book [WS]. In order to describe the algorithm, we first give the necessary background on semidefinite programming. The proof that this minimum exists is nontrivial but straightforward; see Figure 1 for empirical evidence and [GW] for the proof. 8-

1.1 Semidefinite and Vector Programming A semidefinite program (SDP) is an optimization problem that seeks to optimize a linear function of the entries of a positive-semidefinite matrix. Rather than defining exactly what this means, we will restrict our discussion to vector programs, which are a type of semidefinite program (for details on semidefinite vs. vector programming, see a textbook such as [Va] or the upcoming [WS]). A vector program (VP) is an optimization problem with vector variables in which one seeks to optimize a linear function of those vectors dot products, subject to inequalities that are also linear in the dot products. In symbols, a VP has n vector variables v 1,..., v n R n, and is of the form min or max ij c ij (v i v j ) subject to ij a ijk (v i v j ) b k, k = 1,..., m v i R n, i = 1,..., n where the a ijk, c ij, b k are real numbers. Clearly, the inequality can be reversed or made an equality. Observe that the dimension of the vector variables is exactly the number of variables. As noted above, vector programs are a type of semidefinite program; semidefinite programs can be solved in polynomial time, so the same is true of vector programs. 3 The interested reader is directed to [GW] or textbooks such as [Va] and [WS] for references on semidefinite programming. VPs are useful because they can be used to relax integer programs; a relaxation of a problem in which one is optimizing a function over some solution space is a problem in which one optimizes the same function over a larger space. This larger space is chosen so that the latter problem can be solved efficiently even if the former problem cannot. Then, a solution to the latter problem is transformed to an approximate solution to the former. Let us now demonstrate this paradigm by describing the GW algorithm. 1. The algorithm Consider the following formulation of MAX-CUT as a quadratic integer program: 3 Actually, vector programs are solved to within an arbitrary additive error ɛ, with the running time dependent on log(ɛ 1 ); we will ignore this error in these notes. 8-3

max ij 1 y i y j y i { 1, 1}, i = 1,..., n () The optimization problem above is equivalent to MAX-CUT because given S V, setting y i = 1 for i S and y j = 1 for j / S gives a feasible solution to the program with value ij 1 y i y j = i S,j / S On the other hand, given a solution to the program (), defining S = {i y i = 1} produces a cut whose value is exactly the value of the objective function. Therefore, the optimal value of the program is exactly opt(g). Solving quadratic integer programs is NP-hard. However, consider the following VP, which as we noted is solvable in polynomial time:. max ij 1 v i v j v i = 1, v i R n, i = 1,..., n i = 1,..., n (3) We will refer to this as the MAX-CUT SDP (and not the MAX-CUT VP, just to remain consistent with the literature). Note that a solution to this SDP is a set of vectors that lie on the n-dimensional unit sphere. Let sdp(g) denote the value of the MAX-CUT SDP for a given graph G. In the next lemma, we show that this SDP is a relaxation of the program (). Lemma 1 For any graph G, sdp(g) opt(g). Proof. Since opt(g) is the optimal value of the program (), it suffices to show that to any feasible solution {y i } i V of that program there corresponds a solution {v i } i V of the SDP (3) with the same value. This is simple: define v i = (y i, 0,..., 0). Now v i = yi = 1, and moreover as desired. ij 1 v i v j = ij 1 y i y j The GW algorithm uses a solution to the SDP (3) to sample a random cut in the graph whose (expected) value is provably close to opt(g). Observe that this algorithm is randomized, just like the very first MAX-CUT algorithm we described. Sampling the cut is done, informally, as follows: 8-4

Goemans-Williamson algorithm : 1. Given G = (V, E, w), solve the MAX-CUT SDP.. Uniformly sample a hyperplane in R n through the origin. 3. Define S to be the set of i such that v i is above the hyperplane. In order to make this rigorous, we need a way to uniformly sample hyperplanes through the origin. We first observe that sampling hyperplanes uniformly is equivalent to sampling unit vectors uniformly: the vector is the normal to the hyperplane. We can sample unit vectors uniformly as follows: sample a random vector r R n by drawing each component from N (0, 1), the normal distribution with mean 0 and variance 1, and then normalize r. Fact r/ r is uniformly distributed over the n-dimensional unit sphere. 1 Proof. The probability density function (pdf) of the ith component is π exp( ri /), so the pdf of r is n i=1 1 π e r i / = 1 (π) n/ exp( r /) which depends only on the length of r. It follows that the probability that r points in any one direction is the same as the probability that it points in any another. Now we can formally state the GW algorithm. Goemans-Williamson algorithm: 1. Given G = (V, E, w), solve the MAX-CUT SDP.. Sample r as above. 3. Define S to be the set of i such that v i r 0. It remains to prove this algorithm s performance guarantee. 8-5

Theorem 3 The expected value of the cut produced by GW is at least α opt(g). For brevity, write opt(g) as opt and let alg be the value of the cut produced by the GW algorithm. Rephrasing the theorem, we want to prove that E[alg] α opt. By linearity of expectation, E[alg] = E i S,j / S = ij E Pr[i, j are cut by r] (4) where by i, j are cut by r we mean that either v i r 0 and v j r < 0, or v j r 0 and v i r < 0. Let us compute this probability. We will need the following fact about normal distributions. Fact 4 Let u 1, u be two orthogonal unit vectors. Then, with r as above, r u 1 and r u are independent random variables distributed in N (0, 1). Proof. This is a restatement of Theorem IV.16.3 in [Ré]; a proof also follows from the relevant results in [Fe]. Lemma 5 The probability that i, j are cut by r is 1 π cos 1 (v i v j ). Proof. Let p be the probability in question. Let r be the projection of r onto the plane P defined by v i and v j ; by Fact 4, the components of r are drawn independently from N (0, 1). Therefore, by the same reasoning we used to prove Fact (), the normalization of r is a uniformly distributed unit vector in P. Since r is the projection of r onto P, we have v i r = v i r and v j r = v j r. Therefore, p is exactly the probability that for a random unit vector r in P, either v i r 0 and v j r < 0, or v j r 0 and v i r < 0. If θ is the angle between v i and v j, it is easy to deduce from Figure that this probability is exactly θ/π. But θ = cos 1 (v i v j ), so we are done. Proof of Theorem 3. Returning to equation (4) and applying Lemma 5, E[alg] = ij E Pr[i, j are cut by r] = ij E 1 π cos 1 v i v j. 8-6

Υ i Υj Θ Θ Θ Figure : Two unit vectors v i and v j with an angle of θ between them. In the lightly shaded sectors we have either v i r 0 or v j r 0, but in the heavily shaded sector both are true. But for x [ 1, 1] Therefore, 1 π cos 1 (x) = ( π cos 1 (x) 1 x 1 x min 1 x 1 = α 1 x. π cos 1 (x) 1 x ) 1 x E[alg] = ij E ij E = α sdp. 1 π cos 1 v i v j ( α 1 v i v j Since sdp opt (Lemma 1), we have E[alg] α opt as desired. The GW algorithm was derandomized by Mahajan & Ramesh in [MR]. We conclude that there exists a deterministic polynomial time approximation algorithm for MAX-CUT with approximation ratio α. ) 8-7

The Integrality Gap The GW algorithm is pretty good, but can it be improved? Is there, perhaps, a more clever SDP-based algorithm for MAX-CUT that achieves an approximation ratio better than α? In this section we prove a result due to Feige & Schechtman [FS] that provides strong evidence to the contrary. To state the result, let us make some definitions. Recall that for a graph G, opt(g) is the value of an optimal solution to MAX-CUT on G and sdp(g) is the optimal value of the MAX-CUT SDP (3) on G. We define alg(g) to be the value of the cut produced by the (derandomized) GW algorithm on G. Then we have α sdp(g) alg(g) opt(g) sdp(g) where the first inequality follows from Theorem 3 and the third from Lemma 1. Define the integrality gap of the MAX-CUT SDP to be gap = inf G opt(g) sdp(g). By the above, we have α gap 1. This quantity measures the efficacy of MAX-CUT algorithms based on the MAX-CUT SDP. In [GW], the authors mention the observation by Delorme and Poljak that if C 5 is the 5-cycle then opt(c 5 )/ sdp(c 5 ).8845 (here = 1 for (i, j) E and 0 otherwise). Lemma 6 opt(c 5 )/ sdp(c 5 ).8845. Proof. Clearly opt(c 5 ) = 4. We can bound sdp(c 5 ) from below by the value of a feasible solution to the SDP. Arrange v 1 through v 5 in the plane as in Figure 3. The angle between any v i and v j that correspond to an edge is 4π/5, so the value of the SDP is ij 1 cos(4π/5) Thus sdp(c 5 ) 4.55 and the claim follows. = 5 1 cos(4π/5) 4.55. It follows from the lemma that gap.8845. This is awfully close to α, and indeed, a few years after the publication of the GW algorithm, Feige & Schechtman proved in [FS] that gap = α. In particular, they proved the following. Theorem 7 For every ɛ > 0 there exists a graph G for which opt(g)/ sdp(g) α + ɛ. In this section we will prove this theorem; for the sake of simplicity, we will prove that opt(g)/ sdp(g) α + O(ɛ). 8-8

Figure 3: A feasible solution to the MAX-CUT SDP for C 5..1 Definitions and Parameters We will define G by embedding its vertices on the surface of the d-dimensional unit sphere; this embedding provides a feasible solution of the MAX-CUT SDP, which we will use to bound sdp(g) from below. Our strategy will be defining an infinite graph G on the unit sphere, and then discretizing it. First, some definitions. Given d, define S d 1 to be the surface of the d-dimensional unit sphere: S d 1 = {v R d v = 1}. We will often refer to the measure of a subset of S d 1 ; this is the natural (d 1)-dimensional Lebesgue measure, normalized so that the measure of S d 1 is 1. Given u, v S d 1, recall that cos 1 (u v) is the angle between u and v. We will treat this quantity as a metric on S d 1. Balls with respect to this metric will be called caps. In other words, a cap of radius θ centered at u is the set of v such that cos 1 (u v) θ. A cap of radius π/ is a hemisphere. Let x be the value that achieves the minimum in equation (1) and set θ = cos 1 (x ), so that θ = α. (5) π 1 cos(θ ) To construct G and G, we choose parameters θ 1, θ, d, γ as follows: 8-9

Choose parameters: 1. θ 1 and θ such that for θ [θ 1, θ ]. θ π 1 cos θ α + ɛ. Choose d such that in S d 1, the measure of a cap of radius π θ is an ɛ-fraction of the measure of a cap of radius π θ 1. 3. Choose γ such that the measure of a cap of radius π θ 1 γ is at least a (1 ɛ)-fraction of the measure of a cap of radius π θ 1 + γ, and also cos(θ ± γ) cos θ ɛ for every θ. The proof that d can be chosen as above is straightforward; it relies on the fact that the measure of a cap of radius ϕ in S d 1 can be bounded below by the volume of a ball in R d 1 of radius sin ϕ, and bounded above by the surface area of a hemisphere in R d of radius sin ϕ; see Lemma 9 in [FS] for details. 4 Next, we define G.. The infinite graph G Given the parameters chosen in the previous section, consider the infinite graph G whose vertex set is S d 1, and where two vertices u, v are connected by an edge if cos 1 (u v) θ 1. Let Ẽ be the set of edges. Let µ be the natural, normalized (d 1)-dimensional measure on S d 1, and let µ be the induced product measure µ µ on S d 1 S d 1. We define the following continuous analogues of sdp and opt for G: 1 u v sdp = dµ and (u,v) e E { } õpt = max µ (u, v) Ẽ u A, v / A. A S d 1 4 In fact, this is a simple manifestation of the phenomenon of measure concentration on the sphere. 8-10

The quantity being maximized in õpt is a continuous analogue of the value of a cut, and we will often refer to it as the measure of a cut in G. We claim that õpt / sdp α + O(ɛ); once we show this, we will discretize G to get G and prove Theorem 7 by showing that opt(g)/ sdp(g) õpt / sdp. To begin, observe that since u v cos(θ 1 ) for every (u, v) Ẽ, we have sdp 1 cos θ 1 1 cos θ O(ɛ) where the last inequality is by our choice of θ 1. Therefore, to prove õpt / sdp α + O(ɛ) it suffices to show that õpt θ /π + O(ɛ). To this end, given 0 ρ π and some A S d 1, define µ ρ (A) = µ { (u, v) u A, v / A, and cos 1 (u v) ρ }. Observe that µ ρ (A) = µ ρ (Ā) and that õpt = max A S d 1 µ θ1 (A). (6) The following is Theorem 5 in [FS]. Theorem 8 Given 0 a 1 and 0 ρ π, the maximum of µ ρ (A) over all subsets of S d 1 of measure a is attained by a cap of measure a. This is a type of isoperimetric inequality on the sphere. We defer to [FS] for the highly nontrivial proof. Corollary 9 Given 0 ρ π, the maximum of µ ρ (A) over all subsets of S d 1 is attained by a hemisphere. Proof. Let β be the maximum value of µ ρ (A); by the theorem, it is attained by a cap C. Thus it suffices to show that there exists a hemisphere H with µ ρ (H) µ ρ (C). Assume that µ(c) < 1/ (otherwise replace C with C) and let z be the center of C. Define H to be the hemisphere centered at z. We claim that for every x H \ C, the measure of edges from x to H is at least the measure of edges from x to C. This is because if x has an edge to some y C, then x also has an edge to the reflection of y across the hyperplane that separates H from H (and this reflection is, by definition, in H). This claim implies µ ρ (H) µ ρ (C), as desired. Thus µ ρ (A) is maximized when A is a hemisphere, and by symmetry, this maximum value is attained by every hemisphere. If we set ρ = θ 1, we find that the maximum measure of a cut is attained by a(ny) hemisphere. By equation (6), for any hemisphere H we have µ θ1 (H) = õpt. (7) 8-11

We will use this fact to bound õpt, but first we must modify G a bit; remove from G all edges (u, v) where cos 1 (u v) > θ, so that all edges are between vector pairs whose angle is in [θ 1, θ ]. Now, it s easy to check that on S d 1, the µ -measure of pairs (u, v) whose angle is greater than some ρ is exactly the measure of a cap of radius π ρ. Thus by our choice of d, the fraction of edges removed from G is at most ɛ. Note that this edge removal changes sdp by at most ɛ, so that we still have sdp 1 cos θ O(ɛ). As for õpt, late us make a distinction between the value of õpt before we removed edges, written õpt old, and the new value of õpt, which we write as simply õpt. Averaging over all hemispheres and repeating the analysis of the GW algorithm, we find that in the new G the expected measure of a cut induced by a hemisphere is at most θ /π. But the value of this expectation changed by at most ɛ, again by our choice of d. Since the value of this expectation in the old G was exactly õpt old by equation (7), we have õpt old θ /π + ɛ. But õpt õpt old, since all we did was remove edges! Therefore as desired. We summarize this section in a lemma. õpt θ π + ɛ θ π + O(ɛ) Lemma 10 If G is the infinite graph with vertex set S d 1 in which (u, v) is an edge if cos 1 (u v) [θ 1, θ ], and õpt, sdp are defined as above, then õpt / sdp α + O(ɛ). Proof. By the discussion above, õpt sdp θ /π + O(ɛ) (1 cos θ )/ O(ɛ) θ /π + O(ɛ) = α + O(ɛ). (1 cos θ )/.3 Discretizing G We now discretize G. We use the following result, which is Lemma 1 in [FS]. Lemma 11 For any 0 < γ < π/, the sphere S d 1 can be partitioned into n = (O(1)/γ) d cells of equal measure, each of diameter at most γ. Since we normalized our measure, the measure of each cell is 1/n. We again defer to [FS] for the proof. We note that the proof in [FS] is not entirely constructive. This is not an issue for us, but algorithmic proofs do exist elsewhere in the literature (e.g. [Le]). Now we are ready to define the finite graph G = (V, E) that will be used in the proof of Theorem 7. We do this as follows: 8-1

1. Given parameters θ 1, θ, d, γ as before, partition S d 1 into n = (O(1)/γ) d cells of equal measure and diameter at most γ.. Pick an arbitrary v i from cell C i and add it to V. 3. Add (v i, v j ) to E if cos 1 (v i v j ) [θ 1, θ ]. In other words, we construct G from G by choosing (arbitrarily) one vertex from each cell of the partition in Lemma 11, and retaining whatever edges existed in G. Let us abbreviate opt(g) and sdp(g) by opt and sdp, respectively. We must now relate sdp to sdp, and opt to õpt. Theorem 1 sdp sdp opt + O(ɛ) and õpt n n O(ɛ). Proof. Let us begin with the first inequality. Recall that 1 u v sdp = dµ (8) (u,v) e E where Ẽ is the set of edges left after the removal of long edges (see Section.), and sdp (v i,v j ) E 1 v i v j (9) because the vertices of G are by construction embedded in the unit sphere, and this embedding gives a feasible solution to the MAX-CUT SDP for G. Let us, for the sake of brevity, abuse notation by referring to the quantity on the right-hand side of (9) as sdp. Given the partition of S d 1 into cells {C i } in Lemma 11, we can classify cells pairs C i, C j as follows: Near pairs (NP): cos 1 (u i u j ) < θ 1 for all u i C i and u j C j. Far pairs (FP): cos 1 (u i u j ) > θ for all u i C i and u j C j. Near mixed pairs (NMP): cos 1 (u i u j ) < θ 1 for some u i C i, u j C j but cos 1 (u i u j ) θ 1 for others. Far mixed pairs (FMP): cos 1 (u i u j ) > θ for some u i C i, u j C j but cos 1 (u i u j ) θ for others. 8-13

Active pairs (AP): cos 1 (u i u j ) [θ 1, θ ] for all u i C i, u j C j. Since the cells {C i } partition the sphere, we can decompose the integral in the definition of sdp (8) into five integrals, each over one of the above cell types. We can similarly partition the sum in the definition of sdp (9), since each vertex of G belongs to some cell. Now, near pairs and far pairs clearly do not contribute to sdp or sdp, since there are no edges spanning such cell pairs in either G or G. Therefore, and sdp = MNP sdp = MNP 1 u v dµ + 1 v i v j F MP + F MP 1 u v 1 v i v j dµ + + AP AP 1 u v dµ (10) 1 v i v j. (11) Let us examine these three types of cell pairs, starting with MNPs. The sum over MNPs in (11) is at least 0, and the integral over MNPs in (10) is at most the measure of all MNPs. We claim that this measure is at most ɛ. To see this, observe that since the diameter of any cell is at most γ, a given cell participates in a MNP only with cells contained in the difference of two caps, one of radius π θ 1 γ and the other of radius π θ 1 + γ. By our choice of γ, the measure of this region is at most ɛ. Thus, each cell contributes ɛ/n to the measure of all MNPs (recall that the measure of each cell is 1/n), and the total measure of all MNPs is at most ɛ. We can examine FMPs in exactly the same manner; the sum over FMPs in (11) is at least 0, while the integral over FMPs in (10) is at most ɛ by an argument similar to the one for MNPs. Putting it all together, we have and sdp AP sdp AP 1 u v dµ + ɛ (1) 1 v i v j. (13) Let us consider, then, an active pair C i, C j whose vertices in G are v i and v j, respectively, and set θ = cos 1 (v i, v j ). These two vertices contribute (1 cos θ )/ to the sum in (13). Now, every pair u i C i, u j C j makes an angle of θ ± γ since the diameter of a cell is at most γ, so the contribution of this pair to the integral in (1) is in the range (1 cos(θ ± γ))/(n ). But by our choice of γ, which implies ) n ( sdp ɛ n 1 cos(θ ± γ) n AP 1 u v 1 cos θ + ɛ n dµ AP ( 1 vi v j + ɛ ) sdp + ɛn 8-14

Cell-pair type Contribution to sdp Contribution to sdp Near pair 0 0 Distant pair 0 0 Near Mixed pair negligible 0 Distant Mixed pair negligible 0 Active pair 1 v 1 v +ɛ 1 v 1 v n Table 1: The contribution of a cell pair C i, C j to sdp and sdp in the proof of Theorem 1, broken down by cell-pair type (v i and v j are the cells respective vertices in G). hence sdp /n sdp O(ɛ), as desired. It remains now to relate opt and õpt; we only sketch the proof. Consider a maximum cut in G, which has value opt, induced by S V. Define A S d 1 by setting A to be the union of cells C i whose corresponding vertices v i are in S. By definition, the measure of the cut induced by A is at most õpt. But by partitioning cell pairs into the five types above, it can be shown that the measure of this cut is at least opt /n O(ɛ), which gives the desired opt /n õpt +O(ɛ). Now the proof of this section s main result is immediate. Proof of Theorem 7. Combining Theorem 1 with Lemma 10 gives as desired. opt sdp opt /n O(ɛ) õpt + O(ɛ) + O(ɛ) α + O(ɛ) sdp /n + O(ɛ) sdp 8-15

References [FS] [Fe] [GW] [Le] U. Feige, G. Schechtman: On the optimality of the random hyperplane rounding technique for MAX CUT. Rand. Struct. and Alg. 0(3): 403 440, 00. W. Feller: An Introduction to Probability Theory and its Applications. Wiley, 1968. M.X. Goemans, D.P. Williamson: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 4: 1115 1145, 1995. P. Leopardi: A partition of the unit sphere into regions of equal area and small diameter. Elec. Trans. Num. An. 5: 309 37, 006. [MR] S. Mahajan, H. Ramesh: Derandomizing approximation algorithms based on semidefinite programming. SIAM J. Comput. 8(5):1641 1663, 1999. [Ré] A. Rényi: Probability Theory. North-Holland, 1970. [Va] V. V. Vazirani: Approximation Algorithms. Springer, 004. [WS] D. P. Williamson, D. B. Shmoys: The Design of Approximation Algorithms. To appear. 8-16