Strong Relaations for Discrete Optimization Problems 20-27/05/6 Lecture : Lovász Theta Body. Introduction to hierarchies. Lecturer: Yuri Faenza Scribes: Yuri Faenza Recall: stable sets and perfect graphs Recall that, for a graph G, we denote by ST AB(G) its stable set polytope, that is, the conve hull of the characteristic vectors of its stable sets. The polytope QST AB(G) = { R + n : (K) for all cliques K of G} is a relaation of the ST AB(G), and it coincides with ST AB(G) if and only if G is perfect. In a previous lecture we saw that the best known upper bound on the (linear) etension compleity of ST AB(G) when G is perfect is O(n log n ). 2 Lovász Theta Body Let G(V, E) be a graph and n = V. Define the Lovász Theta Body to be: T H(G) = { R n : i = X 0i for i =,..., n, X 00 = X ii = X i0 for i = 0,..., n, X ij = 0 for ij E, X 0 }. Theorem Let G be a graph. Then ST AB(G) T H(G) QST AB(G). In particular, when G is perfect, ST AB(G) has a semidefinite lift of size n +. Proof Let be the characteristic vector of a stable set of G. Set U = and X = UU T. Then X 0 by construction, and one easily checks that X satisfies the linear equations. This shows ST AB(G) T H(G). Conversely, let T H(G) and X be the semidefinite matri certifying it. Let C be a clique of G. We show that i C i. This implies T H(G) QST AB(G). Wlog let C = {,..., k}. Consider the principal submatri X of X indeed by rows 0,..., k, and the vector v R C + with v 0 = and v i = for i 0. X 0 by construction, hence 0 v T X v = i,j v i v j X i,j = + i C i 2 i C i = i C i, since the contribution to the summation of the term with i = j = 0 is, of the term with i = j 0 is i, of the term with i = 0 and j 0 (resp. j = 0 and i 0) is j (resp. i ), and all other terms give contribution 0. 3 Introduction to hierarchies Recall the motivation behind the Chvátal closure seen at the beginning of the course: we wanted to obtain a sequence of increasingly tighter (linear) relaations, starting from a possibly loose one. For some problems, e.g. for matching, this procedure was very successful. On the other hand, one of its drawback is that already optimizing over the first Chvátal closure is NP-Hard.
We present here some alternative techniques to obtain increasingly tighter relaations (which are usually called rounds) from a starting linear program, in the special (yet very interesting) case when the original polytope is contained in the 0/ cube. Those different techniques go under the generic name of hierarchies and can be presented in a unified setting. Moreover, since a fied round of any of those hierarchy can be epressed as linear or semidefinite program of size polynomial in the size of the original polytope, the linear optimization problem over those relaation will be polynomial-time solvable, provided the original relaation was compact. The general framework will therefore be the following: we are given Q [0, ] n and we want to find a linear description of conv(q {0, } n ). We assume that Q { : i = α} for all i and α = 0,, else we can reduce to a lower-dimensional problem. 3. Sequential conveification Let P and suppose that 0 < i <. Then is the conve combination of two other points of Q, one with coordinate 0 and the other with coordinate. The set of all points of Q that satisfy this latter property is given by the polytope below. C i (Q) = conv{ Q : i = } { Q : i = 0}. Clearly P C i (Q) Q. Because of Balas theorem on the conve hull of the union of polytopes, the size of C i (P ) is polynomial in the size of Q. We can clearly iterate this procedure over other coordinates. Lemma 2 C n (C n (... C (Q))... ) = P. Proof We show by induction on i that all points in C i (... ) can be obtained as conve combination of points of Q whose coordinates,..., i are 0 or. This is true by construction for i =. Suppose it true up to level i, and pick C i (... ). Then is the conve combinations of two points, say and 2 of C i (... ), having the i-th coordinate equal to (resp. 0). By induction, (resp. 2 ) is the conve combinations of a set of points X (resp. X ) of Q having all coordinates,...i equal to 0 or, and coordinate i equal to (resp. 0). Hence, all points in X and X belong to Q and have cordinates,..., n integer. We can then epress as conve combination of those points, as required. In order to obtain generalizations of this sequential conveification procedure, it will be better to formulate it in terms of cones. Recall that the homogeneization of a closed conve set R is the set 0 K(R) = { R n+ : 0 0 and 0 R} = cone{ : R} and consequently R = { R n : K(R)}. Therefore, for all purposes, a linear description of a polytope R is as good as a linear description of K(R). We now provide a description of K(C i (Q)). From now on, we will always work with cones K R n+ such that K { : 0 = } {} [0, ] n. Let M i (K) = {ˆ = ( 0 ) ( v0, ˆv = ) R n+ : v i = v 0 = v i, ˆv K, ˆ ˆv K} and N i (K) = projˆ (M i (K)). Clearly, both M i (K) and N i (K) are cones. Lemma 3 K(P ) K(C i (Q)) = N i (K(Q)) K(Q). 2
Proof Since P C i (Q) Q, we immediately obtain K(P ) K(C i (Q)) K(Q). To show K(C i (Q)) = N i (K(Q)), observe: C i (Q) = conv{ R n : v, w R n, λ [0, ] : v i = λ, w i = 0, v λq, w ( λ)q, = v + w} = conv{ R n : v R( n, λ )[0, ] : v i = λ = i, v λq, ( v )( λ)q} = conv{ R n v0 : ˆv = R v n+, v 0 [0, ] : v i = v 0 = i, ˆv K(Q), ˆv K(Q)}. Consequently, K(C i (Q)) = cone{ˆ = R v0 : ˆv = R v, v 0 [0, ] : v i = v ( 0 = i ), ˆv K(Q), ˆ ˆv K(Q)}. 0 = cone{ˆ = R v0 : ˆv = R v : v i = v 0 = i, ˆv K(Q), ˆ ˆv K(Q)}, as required. From Lemmas 2 and 3 we immediately deduce that repeating the conveification operation for all i produces K(P ). Lemma 4 N n (N n (... N (K(Q))... ) = K(P ). 3.2 Simultaneous conveification Applying the conveification operation for variable i cuts off points in Q that cannot be written as the conve hull of points of the two faces of Q induced by i = 0 and i =. Call those faces Fi 0 and Fi. Simultaneous conveification ecludes all the points which cannot be written as the conve combination of Fi 0 and Fi, for some i. This is captured by the following cone. N 0 0 (K) = {ˆ = R n+ : ˆv,..., ˆv n R n+ : (ˆ, ˆv i ) M i (K) for i =,..., n} From the previous definition, we immediately obtain: Lemma 5 K(P ) N 0 (K(Q)) N i (K(Q)) K(Q) for all i [n], and recursively applying n times the operator N 0 ( ) starting from K(Q) gives K(P ). Let us give a matri formulation of the operator N 0 ( ). Lemma 6 N 0 (K) = {ˆ R n+ : R n+, we have ˆ i = X 0i for i = 0,..., n, X X K }, where for a cone K X K = {X R (n+) (n+) : Xe i K(Q) for i =,..., n, X(e 0 e i ) K(Q) X ii = X i0 for i = 0,..., n, }. Proof Set ˆ = Xe 0 and ˆv i = Xe i, for all i =,..., n. 3
3.3 A different view on simultaneous conveification We now give an equivalent procedure to deduce the operator N 0 ( ). Let Q = { R n : A b}. Write the l-th inequality from A b as gl T ( T ) = a 0 + a T 0. Then we can write 0 K(Q) = {ˆ = R n+ : g T l ˆ 0}. Note that inequalities i (gl T ) 0 and ( i )(gl T ) 0 are valid for P, where as usual we write P = conv(q {0, } n ), for all i [n]. The set of all such inequalities gives a quadratic system: S = { R n : i (gl T ) = g l T ( T )e i 0 for all i [n] and l ( i )(gl T ) = g l T ( T )(e 0 e i ) 0 for all i [n] and l}. The previous system can be linearized setting X = N 0 (K(Q)). ( ) ( T ). We then obtain eactly 3.4 Adding semidefinite constraints - Lovász-Schrijver s N + operator The matri formulation from Lemma 6 allows us to impose for free some additional constraints to the system. Lemma 7 Let Q [0, ] n be a polytope and P = conv(q {0, } n ). N 0 (K(Q)), where Then K(P ) N + (K(Q)) N + (K) = {ˆ R n+ : and X + K = {X X K and X 0}. ˆ i = X 0i for i = 0,..., n, X X + K }, Proof ( The only ) statement that requires a proof is K(P ) N + (K(Q)). Fi a point Q {0, } n, let ˆ = and X = ˆˆ T. Then X 0. Using the fact that {0, } n, we deduce that Xe i = ˆ, X(e 0 e i ) are either equal to ˆ or to 0, and X ii = X i0. Note that N + (K(Q)) is not a polyhedron anymore. However, the conic setting in which we phrase it allows us to iteratively apply the N + ( ) operator. Let us see an eample: recall the fractional relaation of the stable set polytope of a graph G(V, E): Q = { R n + : i + j for all ij E}. 0 We have K(Q) = { R n+ + : i + j 0 }. We show that N + (K(Q)) K(T H(G)). Note that K(T H(G)) R n+ 0 + is a cone in the variables whose constraints are eactly those describing T H(G), with the eception of X 00 = which is replaced by X 00 = 0. Hence, in order to conclude 4
N + (K(Q)) K(T H(G)), it is enough to show that X ij = 0 for ij E is implied by constraints of N + (K(Q)). Fi ij E. We have Xe i N + (K(Q)) X ii + X ij X i0 and X ij 0 and using X ii = X i0 we conclude X ij = 0, as required. So, in particular, N + (K(Q)) implies all clique inequalities. Those inequalities are known to be hard to obtain for Chvátal closure (in particular, no constant round of Chvátal suffice, in general, to deduce all clique inequalities). 3.5 The Sherali-Adams and Lasserre hierarchies The proof of Lemma 7 suggests us to define matrices associated to vectors in R n with the property that, when is a 0/ point in Q, the matri is somehow well-behaved. Imposing this good behaviour on the matri will therefore cut off some points of Q that are not points of P. Consider the vector y R 2n indeed by subsets of [n]. Define the matri Y (y) R 2n 2 n as follows: Y (y) I,J = y I J for all I, J [n]. Figure : Matri Y (y) R 23 2 3. For r = 2, the submatri induced by U = {, 2} (rows and columns in in green) is a principal submatri of the matri induced by P 2 ([3]) (rows and columns in green and red). 2 3 2 3 23 23 y y y 2 y 3 y 2 y 3 y 23 y 23 y y y 2 y 3 y 2 y 3 y 23 y 23 2 y 2 y 2 y 2 y 23 y 2 y 23 y 23 y 23 3 y 3 y 3 y 23 y 3 y 23 y 3 y 23 y 23 2 y y 2 y 2 y 23 y 2 y 23 y 23 y 23 3 y 3 y 3 y 23 y 3 y 23 y 3 y 23 y 23 23 y 23 y 23 y 23 y 23 y 23 y 23 y 23 y 23 23 y 23 y 23 y 23 y 23 y 23 y 23 y 23 y 23 For an inequality g() = a 0 + a T 0 and a vector y R 2n, consider the shift operator : R n+ R 2n R 2n defined as (g y) I = i [n] 0 a i y I {i}, where[n] 0 = [n] {0} Now, for a point Q {0, }, set y I = i I i and note that Y (y) = yy T is positive semidefinite, and satisfies Y (y), =. Moreover, for any linear inequality g() = a 0 + a T 0 valid for Q, we have: (g y) I = a i y I {i} + a 0 y I = a i y i y I + a 0 y I = g()y I and i [n] Hence, we can safely impose constraints i [n] Y (g y) = g()y (y) 0. Y, =, Y (y) 0 and Y (g y) 0 for all inequalities g() 0 describing Q. () 5
But those constraint are in an eponential number, and we want a compact relaation. The Sherali- Adams and Lasserre hierarchies impose some weaker conditions on the matri Y : this will trade eactness for compactness. For a set U, let P(U) be the power set of U, P r (U) the collection of all subsets of U of size at most r, and g l () 0, l =,..., m, be the inequalities defining Q. We write Y for Y (y) and, for a matri Y and a family F 2 [n], we write Y F for its principal submatri whose rows and columns range over F. Let r [n] 0. Let SA r (Q) = { R n : y R 2n : Las r (Q) = { R n : y R 2n : i = Y,{i} for i =,..., n, Y, = Y P(U) 0 for [n], U r Y P(W ) (g l y) 0 for W r and all l i = Y,{i} for i =,..., n, Y, = Y Pr(V ) 0 Y Pr(V )(g l y) 0 for all l }. } and Note that, in the previous definitions, even if we are asking for a vector y R 2n, we are constructing the submatrices of Y from a subvector of y of soze polynomial in n for fied r. Hence, the associated semidefinite programs are of polynomial size. Lemma 8 For each r N, P Las r (Q) SA r (Q) N 0,r (K(Q))) Q, where we denoted by N 0,r (K(Q)) the conve set obtained by appliyng r times the operator N 0, and then projecting to the original variables. Moreover, P = Las n (Q) = SA n (Q). Proof Note that all the matrices that are imposed to be semidefinite in Las r (Q) are principal submatrices of the matrices Y (y) and Y (g y). Hence, the corresponding inequalities are relaation of inequalities (), showing P Las r (Q). A similar argument shows Las r (Q) SA r (Q). We defer the proof of SA r (Q) K (N 0,r (K(Q))) to the eercise session. Then using Lemma 5, P = Las n (Q) = SA n (Q) follows. 4 Application of hierarchies 4. Set cover By definition, when we conveify Q on variable i, we have that any point in C i (Q) can be written as conve combination of points, 2 Q such that = and 0 = 0. In simultaneous conveification, we have that, for any point proj (N 0 (Q)), for any i [n], can be written as the conve combination of points in Q with coordinate i being 0 and. Since in the proof of Lemma 8 we argued that Sherali-Adams operator is stronger than simultaneous conveification, let us state this fact in terms of SA t. Lemma 9 Let SA t (Q), i [n] with 0 < i <. Then conv{z SA t (Q) : z i {0, }}. As done in the proof of Lemma 2, we can iterate the previous lemma and conclude the following. Lemma 0 Let SA t (Q), i [n], S [n] with S t. Then conv{z SA t S (Q) : z i {0, } for i S}. 6
We now use the previous lemma to show an algorithm that runs in subeponential time and achieves an approimation of ( ε) log n + O() for set cover. Note that it is widely believed that there is no polynomial time algorith that achieves this approimation. Recall that in set cover, we are given n objects and m sets S = S,..., S m with costs c,..., c m, each containing some of those objects. The goal is to find a collection of sets whose union contain all objects at minimum total cost. The algorithm is based on solving a linear optimization problem over SA n ε(q), where Q is the following LP relaation. i for all j (2) i: j S i c i i OP T i [m] [0, ] (3) and rounding its optimum solution. Note that we are assuming that we know the optimum value OPT. This is not cheating: standard techniques in approimation algorithms imply that we can round costs and guess a small number of values of OPT, and repeat the algorithm for all those values, picking the best solution. First part: Let be the optimum solution of min c T : SA n ε(q). Among i with i > 0, pick the one whose associated set covers the biggest set of objects. Then we know that can be written as conve combination of points, 2 in SA nε, with i =. Replace with and iterate n ε times, each time choosing the set that covers the biggest number of uncovered items, among the sets whose corresponding variable is fractional. Note that, if at some point no variable is fractional, we have found a feasible integral solution of value at most OPT, hence we are done. Suppose this is not the case, and let S be the set picked with this procedure, the final point obtained. By Lemma 0, Q, hence i [m] c i i OP T. Second part: Now consider the residual set cover problem, once all sets in S have been picked. Note that each of the remaining sets covers at most n ( ε) elements, since otherwise, together with all the n ɛ sets picked before, which cover at least as many elements as it does, we would have covered more than n elements. We can then eploit the following well-known fact. Theorem Let satisfy the linear relaation of a set cover instance given by (2) and (3). Then an integral solution satisfying (2) and (3) of cost at most ( + log k) m i= c i i can be found in polynomial time, where k is the size of the largest set in the set cover instance. Consider the solution to the original set cover instance that picks the sets S given by the previous theorem, plus those in S selected in the first part. Clearly it is feasible. Its cost is c(s ) + c(s ) c T + ( + log(n ε ))OP T (( ε) log n + O())OP T. 4.2 The decomposition theorem for the Lasserre hierarchy with an application to ma Knapsack Note that so far we did not use any property specific of Lasserre in deriving previous results. Net theorem, whose proof we omit, holds for Lasserre hierarchy only. Theorem 2 [Decomposition Theorem for Lasserre] Let Las t (Q), S [n] with S supp() k for all Q {0, } n. Then conv{z Las t k (Q) : z i {0, } for i S}. 7
The previous theorem allows us to show the integrality gap of the Lasserre hierarchy quickly converges to zero for the ma knapsack problem. This separates Lasserre from Sherali-Adams hierarchy, for which this number stays arbitrarily close to 2 after linearly many rounds. Recall that the ma knapsack problem is defined as ma c T st w T β 0 {0, } n, and the integrality gap IG t of the r-th round of Lasserre is ma Las t(q) c T, where Q is the classical OP T linear relaation of the Knapsack polytope. It is known well-known that one has ma Q c T OP T + ma i c i. Theorem 3 For any knapsack instance and any t 2, IG t + o( t ). Proof Let be optimum solution of ma Last(Q) c T. Let S = {i [n] : w i > OP T t+ }. Note that, for any integral solution to the knapsack problem, one has I S t, else the profit of I is strictly bigger than the profit of an optimum solution. Hence we can apply Theorem 2 with k = t. That is, is the conve hull of feasible points of Q that have integer values in the coordinates corresponding to S. Since the objective function is linear, one of those, say, has profit greater or equal than the profit of. Let S be the coordinates from S that are set to in. We deduce w T w T i S w i + LP, where LP is the solution of ma profit of the original knapsack relaation, among those that take elements in S and not elements in S \ S. Using the bound on the integrality gap at the first round and that all objects not in S have bounded profit, we obtain LP OP T + OP T t +, where OPT is the optimum solution of the integer version of LP. We conclude w T i S w i + OP T + OP T t + OP T + OP T t +. 5 Notes Lovász Theta Body has been studied in many papers. Results presented in here were proved in [3, 6, 7] and we follow the presentation given in [2]. We refer the reader to [5] for an introduction on the different hierarchies, to [] for the application to set cover, and to [4] for results from Section 4.2. Our presentation of hierarchies partly follows [8, 9]. 8
References [] E. Chlamtac, Z. Friggstad, and K. Georgiou. Lift-and-Project Methods for Set Cover and Knapsack. WADS 203, pp. 256-267. [2] M. Conforti, G. Cornuejols, and G. Zambelli. Integer Programming. Springer, 204. [3] M. Grötschel, L. Lovász and A. Schrijver. Relaations of verte packing. Journal of Combinatorial Theory (B) 40, pp. 330 343, 986. [4] A. Karlin, C. Mathieu, and C. Nguyen. Integrality gaps of linear and semidefinite programming relaations for knapsack. In IPCO, pp. 30-34, 20. [5] M. Laurent. A comparison of the Sherali-Adams, Lovász-Schrijver, and Lasserre relaations for 0- programming. Mathematics of Operations Research, 28(3), pp. 470-496, 2003. [6] L. Lovász. On the Shannon capacity of a graph. IEEE Transactions on Information Theory 25, pp. 7, 979. [7] L. Lovász and A. Schrijver. Cones of matrices and set-functions and 0- optimization. SIAM Journal on Optimization, (2), pp. 66-90, 99. [8] T. Rothvoss. The lasserre hierarchy in approimation algorithms. Lecture Notes for the MAPSP 203 - Tutorial, 203. [9]. Zuckerberg. A Set Theoretic Approch to Lifting Procedures for 0, Integer Programming. Ph. D. Dissertation, Columbia University, 2009. 9