Approximate Constraint Satisfaction Requires Large LP Relaxations

Size: px

Start display at page:

Download "Approximate Constraint Satisfaction Requires Large LP Relaxations"

Shannon Sharp
5 years ago
Views:

1 Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi metho have shown that linear programs are solvable in polynomial time. Furthermore, it is known linear programming is P-complete. Therefore, if one was to show that some P-har problem amitte a polynomial-size linear program, then P = P. In an attempt to rule out this approach, Yannakakis [4] gave a framework for proving lower bouns on a large class of linear programs known as extene formulations. Consier the 3XOR n problem on n variables. It s not P-har, but it will serve as a goo running example. An instance of Π 3XOR n consists of m parity constraints {P 1,..., P m }, P l : {±1} n {0, 1} where P l (x) := x i x j x k = a l, for i, j, k [n] an a l {±1} n ; the goal is to maximize the number of constraints satisfie. ote that Π can also be represente uniquely as a multilinear polynomial over {±1} n by taking the Fourier expansion. We can rewrite each P l (x) = x i x j x k = a l as P l (x) := ( 1) 1 a l 2 x i x j x k. The value of Π on some assignment x { 1, 1} n is given by Π(x) = 1 m P i (x), i [m] which is the fraction of constraints satisfie by assignment x. We will enote by opt(π) = max Π(x), x { 1,1} n the largest fraction of constraints of Π satisfiable by any assignment x { 1, 1} n. If we want to express this as a linear program, then we nee to linearize this function. To o this, we can associate some orering to the 2 ( n 3) possible 3XORn constraints, P 1,..., P 2( n 3). 1

2 A natural way of linearizing such a function is to associate with each 3XOR n instance Π on m vertices, a vector Π R 2(n 3), where the ith entry is 1/m if Π contains contains constraint P i an 0 otherwise. Similarly, we can associate with each assignment x { 1, 1} n a vector x R 2(n 3) in which xi = 1 if P i (x) = 1 an 0 otherwise. This satisfies, for every 3XOR n instance Π an assignment x { 1, 1} n, that Π, x = Π(x). This lens itself to a natural linear program: let P R 2(n 3) be the convex hull of all x for x { 1, 1} n ; the linear program is given by L(Π) = max y, Π. y P This polytope P has vertices corresponing to the points x for x { 1, 1} n an facets corresponing to the encoings Π of all 3XOR n instances Π. Therefore, the value returne by optimizing over P will be opt(π). Unfortunately, the polytope P has an exponential number of facets an therefore cannot be optimize over efficiently. One possible way to overcome this issue is to fin some new polytope P in a higher imensional space R n with much fewer facets an such that there is a linear projection from P own to P. We coul then optimize over the new polytope P instea of optimizing over P. Such a polytope P is known as an extene formulation of the polytope P. The size of an extene formulation is the number of facets of the polytope, while the extension complexity of the base polytope P, enote xc(p) is the smallest extene formulation of P. We stress that an extene formulation P epens only on the input size an not the particular instance Π 3XOR n that we want to compute; the instance Π is efine only in he objective function. Yannakakis gave a beautiful characterization of the extension complexity of a polytope in terms of the non-negative rank of its slack matrix. Consier a linear program P computing 3XOR n. The slack matrix M S has rows corresponing to the instances Π 3XOR n, an columns corresponing to the vertices x of P. The entry at some row, column ( Π, x) is the slack between that vertex an that instance, M S Π, x := L(Π) x, Π, where L(Π) = max y P y, Π. The non-negative rank of a matrix M, enote rk + (M) is the smallest imension r such that M can be written as a prouct of two non-negative matrices F an V with inner-imension r. Theorem 1. (Yannakakis [4]) For any polytope P, xc(p) + 1 = rk + (P) The Proof relies on Farkas Lemma. 2

3 = Figure 1: Representation of Theorem 1, the ecomposition of the slack matrix into two non-negative matrices with inner imension r. Lemma 1. (Farkas Lemma) Let P be a with facets efine by inequalities {A 1 x b 1,..., A m x b r } an let Cx be an inequality that is vali for P (that is, every point α P satisfies Cα ) then there exists λ 0,..., λ r R 0 such that Cx = λ 0 + r λ(b i A i x) We will only prove the forwar irection, since it is all that we will nee. Proof. (of Theorem 1) Let P be an extene formulation of P such that P has r facets, efine by inequalities A 1 x b 1,..., A r x b r. Observe that for every Π 3XOR n, the inequality opt(π) Π, y 0 is vali for the polytope P, for every y P an furthermore, that opt(π) = L(Π) because P computes 3XOR n exactly. Applying Farkas Lemma, we can write L(Π) Π, y = λ 0 ( Π) + r λ i ( Π) (b i A i, y ), (1) for some λ 0 ( Π),..., λ r ( Π) R 0. ow, because there is a linear projection from P to P, there is a vertex v of P that projects to each vertex x of P. We will restrict to these vertices, L(Π) Π, x = λ 0 ( Π) + r λ i ( Π) (b i A i, v ). (2) Furthermore, the x are in one-to-one corresponence with the x { 1, 1} n, we can rewrite this as each b i A i, v as a non-negative function q i : { 1, 1} n R 0, where q i (x) = b i A i, v. 3

4 Therefore, we can rewrite equation 2 as L(Π) Π, x = λ0 ( Π) + r λ i ( Π) q i (x); this is the slack between vertex x an instance Π. We now construct the non-negative matrices V an F with inner imension r + 1. Let the rows of V be inexe by the Π for Π 3XOR n an the columns of F be inexe by the x for x { 1, 1} n. The Πth row of V will be the vector [λ 0 ( Π),..., λ r ( Π)] corresponing to Π. The ith row of F is the truth table encoing of q i, where the (i, j)th entry of F is the evaluation of q i (x). The (r + 1)st row of F is the all 1 vector. This can be seen in figure 1. Therefore, the inner prouct between V Π an F x is r λ 0 ( Π) + λ i ( Π) q i (x) = L(Π) Π, x. ote: Because the extene formulation computes Π 3XOR n exactly, L(Π) = opt(π). Furthermore because the rows an columns of the slack matrix are in one-to-one corresponence between x { 1, 1} n an Π 3XOR n, the ( x, Π)th entry of the slack matrix is equivalent to M S Π,x = opt(π) Π(x), because, x, Π = Π(x). Therefore, the slack matrix will be the same for any base polytope P, the particular linearization is irrelevant. Therefore, more generally, we can efine an extene formulation that exactly computes 3XOR n as a polytope P R n such that 1. for every assignment x { 1, 1} n there is a vector x P an for every instance Π 3XOR n there is a vector Π R such that Π(x) = x, Π 2. opt(π) = max y P y, Π for every Π 3XOR n. The extension complexity of 3XOR n, xc(3xor n ) is then the smallest extene formulation for 3XOR n. The key fact from Theorem 1 that we will use is that if an extene formulation P of size r computes 3XOR n then, for every instance Π, there exists a representation L(Π) Π = λ 0 (Π) + r λ i q i (Π), where each q i is a slack function of P. We will call this representation an extene formulation witness, because it witnesses that P computes Π. From now on, we will write λ i (Π) as simply λ i, where the epenence on Π is implicit. 4

5 Recall that the egree- Sherali-Aams hierarchy computes an instance Π 3XOR n if opt(π) Π can be written as a non-negative linear combination of -juntas, q i, opt(π) Π = i I λ i q i. This representations is superficially similar, an one might woner if there is a way to approximate an extene formulation witness with a Sherali-Aams witness. Obviously, it woul be too much to hope for that each of the non-negative functions in the extene formulation witness coul be well approximate by a non-negative junta. Surprisingly, Chan, Lee, Raghavenra an Steurer [1] showe that after a specialize ranom restriction, the resulting q i can be well approximate by non-negative -juntas. Using this, they are able to lift Sherali-Aams lower bouns to extension lower complexity lower bouns. This transformation works for the class of constraint satisfaction problems (CSP), but we will prove it for the special case of 3XOR n. Theorem 2. ([1]) Suppose that the (n)-roun Sherali-Aams relaxation cannot compute 3XOR n, then for all sufficiently large n, no extene formulation of size at most n (n)2 can compute 3XOR for some = n 10(n) We begin with the family of 3XOR instances over variables, an some extene formulation P of size r. By Yannakakis Theorem above, each instance Π 3XOR, can be written as L(Π) Π = r λ iq i, where each q i is a function { 1, 1} R 0. Our goal is to write (a restriction of) L(Π) Π as a non-negative linear combination of -juntas plus some small error term. The proof procees in three steps. 1. First, we show that we can restrict our attention to q i that are sufficiently smooth (the infinity norm of these functions is boune). 2. Then, we show that each of these q i can be approximate by an 0.2 -junta q i, such that the error on the low egree Fourier coefficients of q i q i is small. Here we crucially use the fact that egree- Sherali-Aams can only reason about monomials of egree up to. This step will incur some error, but we will show that this error goes to 0 as n goes to infinity. 3. Up until now, this proof has worke for any instance Π 3XOR. We will now fix a particular instance which will allow us to make the connection to Sherali-Aams lower bouns. Let Π 0 3XOR n be a har instance for Sherali-Aams on n variables. To obtain the instance Π 3XOR, we plant Π 0 at ranom insie a larger space of variables by picking a subset of n of the variables an efining the constraints of Π 0 on them; the remaining n variables will remain unconstraine. Finally, we argue that with high probability, the set of significant coorinates of q i when restricte to the variables on which Π 0 is efine is at most, an so the existence of this extene formulation implies that egree- Sherali-Aams computes this instance exactly. 5

6 Fourier Analysis We will nee several tools from Fourier analysis. We will efine the inner prouct between two n-variable functions f an g as f, g := E x { 1,1} n[f(x)g(x)], where the expectation is taken over the uniform istribution on { 1, 1} n. The Fourier representation of a function f : { 1, 1} n R is its unique representation over the basis of parity functions χ α := i α x i for α [n]. We can represent f over this basis as f = α [n] ˆf(α)χ α, where the fourier coefficient ˆf(α) is efine as f in the χ α irection, ˆf(α) := f, χ α. Intuitively, ˆf(α) measures the correlation of the variables Πi α x i. Throughout this, we will use the functions regular representation an its Fourier representation interchangeably. Furthermore, if f is non-negative an E x { 1,1} n[f(x)] = 1, then we can treat the Fourier coefficients of f as a istribution over { 1, 1} n. Step 1: Smooth Slack Functions We will now prove the main theorem by following the three steps lai out previously. Again, suppose that we have an extene formulation P of size r /2 which computes 3XOR exactly. By Yannakakis Theorem, for any Π 3XOR, we can write Π as a sum of nonnegative slack functions, r L(Π) Π = λ 0 + λ i q i, where λ i 0 an q i : { 1, 1} R 0. Furthermore, we can normalize each q i an write it as q i (x) = γ i q i (x) for some γ i R 0 such that E[q i ] = 1. That is, Define the set L(Π) Π = λ 0 + r (λ i γ i ) q i. Q := {i : q i }, of the q i which are fairly smooth. Recall that is the egree of the Sherali-Aams proof we are trying to obtain. We will show that restricting attention to the set of functions Q will only incur a small aitive error. We can ecompose the previous sum into L(Π) Π = λ 0 + (λ i γ i ) q i + (λ j γ j ) q j. i Q 6 j [r]\q

7 Because the value of opt(π) Π(x) [0, 1] for every x { 1, 1} an Π 3XOR an because λ j, γ j 0 an q j is non-negative an E[q j ] = 1, we must have λ j γ j for every instance Π 3XOR. Because of this, j [r]\q (λ jγ j )q j cannot be very large an we will treat it as some small aitive error term, which we will enote by ε(π). Later, we will boun its value, L(Π) Π = λ 0 + i Q(λ i γ i ) q i + ε(π). Step 2: Approximate Functionals by High-Degree Juntas The aim now is to show that the smooth slack functions q i for i Q can be well approximate by high-egree juntas. For this, we will use a ensity version of Chang s Lemma. The proof follows from the entropic proof of Chang s Lemma in Impagliazzo, Moore an Russell [2]. Lemma 2. (Chang s Lemma) Let q be a ensity with entropy at least t for some t 0, let σ > 0 an efine R = {α : ˆf(α) σ2 t }. Then R spans a space of imension less than 2t/σ 2 A consequence of Chang s Lemma is the following. Lemma 3. If q i has entropy at least log, then for any σ > 0, there exists a set J(q i ) [] with J(q i ) 22 log σ 2 such that for every α J(q i ) with α, we have ˆq(α) σ. Proof. Consier S = { α : ˆq(α) σ} an let S be the maximal set of linearly inepenent elements in S. The ensity version of Chang s Lemma states that, after setting t = log, that S 2σ 2 log. Let J(q i ) = α S α, then J(q i ) 2 2 log /σ 2 because each α contains at most elements (it follows by linear inepenence that for all α J(q i ) with α, that ˆq i (α) sigma. This lemma says that we can ecompose any high-entropy q i into two parts q i an e i, where q i = ˆq i (α)χ α, an e i = ˆq i (α)χ α. α J(q i ) α []\J(q i ) That is, q i epens only on the set of variables in J(q i ) an in e i, the Fourier coefficients of correlations up to egree- are very small. Beyon egree- we have no control over the magnitue of the Fourier coefficients in e i. However, recall that the egree- Sherali-Aams hierarchy can only perceive correlations of egree up to. Therefore, because our en goal is to convert this into a Sherali-Aams proof, this is a non-issue for us. Therefore, if we coul ensure that each of the q i were -junta that is, that J(q i ), an that the extra error e i was small, then the proof woul be finishe. We woul have 7

8 arrive at a representation of L(Π) Π consisting of -juntas plus some small aitive error. Unfortunately, because we nee i Q e i to ten to 0 as n, it turns out that the largest ( ) 16n that we will be able to set σ, an still achieve this is σ = 2 log 1/2. Uner Lemma 3, this only guarantees that each q i is an ( /8n)-junta, which is approximately 0.2 when the final numbers are plugge in. Finally, we verify that each q i with i Q inee has high enough entropy to satisfy the hypothesis of Lemma 3: H(q i ) = ( ) q i (x) 2 2 log ( ) q i (x) 2 log q i (x) 2 q i x { 1,1} x { 1,1} ( ) q i (x) 2 log = log, 2 x { 1,1} where we use the fact that E[q i ] = 1. So far we have achieve a representation of the form L(Π) Π = λ 0 + i Q(λ i γ i ) (q i + e i ) + ε(π), where e i are error terms whose Fourier coefficients corresponing to egree-up-to- correlations are boune by σ, an q i are 0.2 -juntas. Step 3: Ranom Restriction to a Har Instance for Sherali-Aams The final step is to reuce the 0.2 -juntas to -Juntas. To o this, we will employ a special ranom restriction which will restrict to an instance Π 0 3XOR n for which we have Sherali- Aams lower bouns. ote that until this point, the steps of the proof have not relie on the particular instance of 3XOR. We will now restrict attention to a particular sub-family of instances. Consier an instance Π 0 of 3XOR n on n variables, where n is much smaller than (Π 0 shoul be thought of as a har instance for Sherali-Aams). To create our instance Π, we will ranomly plant Π 0 insie a larger space of unconstraine variables by picking a subset S of n variables an efining the constraints of Π 0 on them. The iea is that since the only constraints in Π are those corresponing to Π 0, L(Π) = opt(π) = opt(π 0 ). ow, because each of the junta q i epen on at most 0.2 variables, then if we restrict to the variables of Π 0, with high probability only a small fraction of the variables on which q i epens will remain. This can be seen in figure 2. This will be one in the following lemma; recall that in step 2, using Chang s Lemma, we ecompose q i = q i + e i. 8

9 [] var(q 1 ') var(q 4 ') S var(q 2 ') var(q 3 ') var(q 5 ') Figure 2: The intersection between the variable space of each 0.2 -junta q i an the restricte set S on which we will plant Π 0. Lemma 4. There exists a set S [] of size n such that for each q i with i Q, there is a set J(q) S with J(q) such that for all α S \ J(q) with α. ˆq(α) ( 16n 2 log ) 1/2, For the proof, we will nee the following inequality. Let X 1,..., X n be i.i.. {0, 1}- ranom variables, with E[X i ] = p. Then [ n ] Pr X i t (pn) t (3) Proof. We will choose the set S as follows: 1. Uniformly at ranom, pick a partition of [] into sets S 1,..., S n, each of size /n. 2. For each variable i [n], pick a variable v i from S i uniformly at ranom. 3. Let S = {v i : i [n]} 9

10 In step 2 we argue, using Lemma 3, that we coul ecompose q i = q i + e i, where each q i is an /8n-junta which epens on a set of coorinates J(q i ), an for every α [] ( ) 1/2. with α, e i (α) We will show that with some positive probability, the 16n 2 log n intersection of each of the sets J(q i ) with the set S is at most. For each variable l J(q i ), let X l be the event l S. Then, E[q i ] = n/ because we are choosing each element of S uniformly at ranom, an so Pr[ J(q i) S ] = Pr ( X l n ) 1 J(q i) 8, /2 l J(q i ) where the secon inequality follows from inequality 3 above. Finally, because we have assume that our original extene formulation is of size at most /2, we have that Q /2, an so taking a union boun over all J(q i ) for i Q completes the proof. Finally, we construct the instance Π 3XOR as follows: Let Π 0 be an instance of 3XOR n on n variables. Apply Lemma 4 to obtain a subset S = {v 1,..., v n } []. Define the constraints of Π as the constrains of Π 0 efine on the variables {v 1,..., v n }; the remaining n variables are left unconstraine. Let E be the egree- Sherali-Aams pseuo-expectation which achieves the optimal value on the Π 0, E = SA [Π 0 ] = max Ẽ[Π 0 ], Ẽ P E where we think of Π(x) as its representation as a multilinear polynomial so that we can apply E to it. Furthermore, we can represent each of the functions q i as a multilinear polynomial by taking its Fourier transform. We will think of q i as having that representation from now on so that we can apply E to them. We now plant E on the set of variables S, that is, we efine E on the variables in S an exten it to have Fourier coefficient 0 on all terms outsie of S. To o this, we note that Π is unconstraine on variables outsie of S an therefore, we efine the unerlying pseuo-istribution to be uniform on all variables on outsie of S. Applying it to both sies of equation?? we arrive at E [L(Π) Π(x)] = λ 0 + i Q λ i γ i (E [q i] + E [e i ]) + E [ε(π)] L(Π) SA [Π 0 ] = λ 0 + i Q λ i γ i (E [q i] + E [e i ]) + E [ε(π)], ow, because E gives non-zero value only on the variables of S, we have that E [q i] = E [q i S ], an so, by Lemma 4, we know that q i epens only on at most variables in S, so it is a non-negative -junta. Therefore, E [q i] 0, an so L(Π) SA [Π 0 ] + i Q λ i γ i E [e i ] + E [ε(π)]. 10

11 ow, if we can show the error terms, i Q λ iγ i E [e i ] + E [ε(π)] go to 0 as n, then we will arrive at a representation of the form L(Π) SA [Π 0 ], an so Sherali-Aams lower bouns will imply extension complexity lower bouns. Bouning Error Terms All that is left is to show that the error terms go to 0 as n. We begin with bouning E [ε(π)] = (λ j γ j ) E [q j ], (4) j [r]\q the error term that we obtaine from Step 1. We will nee a simple fact about pseuoexpectations Claim 1. For any egree- pseuo-expectation E in its Fourier representation as a multilinear polynomial over { 1, 1} n, we have E, α [n] E [χ α ] i=0 ( n i). Proof. The Fourier representation of E is E = α E [χ α ]χ α, where χ α = i α x i. We know that because E is a pseuo-expectation that E [χ α ] 1 (this follows because a pseuo-expectation is the expectation over a pseuo-istribution). Because it is a egree- pseuo-expectation, there can be at most ( n i=0 ) non-zero Fourier coefficients, each having absolute value at most 1. We can apply this claim as follows: Viewing q j as its Fourier representation as a multilinear polynomial, an noting that E assigns values to monomials of egree at most we can write E [q j ] E i=0 ( ) n, the first inequality follows because the Fourier representation of E is E = α E [χ α ] an so the Fourier coefficient corresponing to the monomial χ α is E [χ α ], the value that E assigns to χ α. Therefore, we can view E as a vector whose αth place is the Fourier coefficient E [χ α ]. Similarly, we view q j as a vector, where the αth entry is the Fourier coefficient ˆf(α), which is the coefficient of χ α in the representation of q j as a multilinear polynomial. Therefore, taking the inner prouct between these two vectors E, q j gives the evaluation E [q j ]. Because q i is a ensity, that is E[q j ] = 1 an q j 0, we can represent it as a istribution over assignments x { 1, 1} n, an associate with each such assignment a 11

12 set which contains element i if x i = 1. Then, E, q j = E α qi [E [α] E During step 1, we argue that because the q j, λ j, γ j 0, that q i for every i [r] \ Q, an because L(Π) Π 0 [0, 1], we must have (λ j γ j ) < for each j [r] \ Q. Putting all of this together, we have E [ε(π)] = j [r]\q (λ j γ j ) E [q j ] ( ) n ( en ) r /2, i where the secon inequality follows because [r] \ Q r, an the final inequality follows because r, the size of the extene formulation, is at most /2. Finally, we boun the error term i Q (λ iγ i ) E [e i ] in a similar way, by noting that E [e i ] α S i=0 E [χ α ] ê i (α) = ( ) ( n 16n 2 log α S: α ) 1/2 ( en E [χ α ] ê i (α) α S: α ) ( 16n 2 log ) 1/2, ( ) 16n E 2 1/2 log [χ α ] where the equality follows because E [χ α ] = 0 for all α >, the thir inequality follows from the boun we got on the size of the egree-up-to- Fourier coefficients of e i from Lemma 4, an the fourth inequality follows from Claim 1. Therefore, we have ( en (λ i γ i ) E [e i ] i Q ) ( ) 16n 2 1/2 log ( en ) ( ) 16n 2 1/2 log (λ i γ i ), i Q where the final inequality follows from the observation that r (λ iγ i ) 1. To see this, note that opt(π) Π [0, 1], an E[q i ] = 1, an apply an expectation over assignments in { 1, 1} n to both sies of opt Π(x) = λ 0 + r (λ iγ i ) q i. Finishing Up Finally, putting everything together, we have L(Π) SA [Π 0 ] = λ 0 + i Q λ i γ i (E [q i] + E [e i ]) + E [ε(π)] λ 0 + ( en i γ i ) E i Q(λ [q i] ) ( ) 16n 2 1/2 log ( en ) /2. Therefore, we arrive at an expression of the form L(Π) SA [Π 0 ] err n. 12

13 We now show that err n := ( ) ( en 4 n log our value for = n 10 we have 1/4 ) ( en ) ( 4 10n log n err n = n ( 5/2 e 4 10n log n + 1 = n 3/2 = o(1). + ( ) en /2 goes to 0 as n. Plugging in ), ) + n 5 Therefore, this theorem lifts Sherali-Aams egree lower bouns of up to to extene formulation lower bouns. Unfortunately, because we set nee to set = n 10, the best lower boun that we can achieve this way (lifting a Sherali-Aams lower boun of egree Ω(n) is o( log log log ). The bottleneck, which limits Chan et al. to only obtaining quasi-polynomial size lower bouns is the application of Chang s Lemma in step 2. Later, Kothari, Meka an Raghavenra [3] overcame this barrier, obtaining truly exponential lower bouns. References [1] Siu On Chan, James R. Lee, Prasa Raghavenra, an Davi Steurer. Approximate constraint satisfaction requires large LP relaxations. J. ACM, 63(4):34:1 34:22, [2] Russell Impagliazzo, Cristopher Moore, an Alexaner Russell. An entropic proof of chang s inequality. SIAM J. Discrete Math., 28(1): , [3] Aam R. Klivans, Pravesh K. Kothari, an Raghu Meka. Efficient algorithms for outlierrobust regression. CoRR, abs/ , [4] Mihalis Yannakakis. Expressing combinatorial optimization problems by linear programs (extene abstract). In Proceeings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages , 1988., 13

Permanent vs. Determinant

Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an