Positive Semi-definite programing and applications for approximation

Combinatorial Optimization 1 Positive Semi-definite programing and applications for approximation Guy Kortsarz

Combinatorial Optimization 2 Positive Sem-Definite (PSD) matrices, a definition Note that we deal only with symmetric matrices. Because in such a case the eigenvalues are real. It does not make any sense to talk on non symmetric PSD matrix. The following are equivalent definition: 1. A symmetric matrix A is PSD 2. There is a matrix B so that B T B = A. 3. All the eigenvalues of the matrix are positive. 4. For every vector v, v T A v 0.

Combinatorial Optimization 3 What is PSD programming? Consider a collection of numbers y ij for 1 i,j n. We use them in an LP and add the constraint: Y 0. This means that as a matrix Y = (y ij ) is a PSD matrix. Note that the definition v T A v 0 implies that if A and B are two vectors of size n 2 that form a PDS matrix, and a,b > 0, then a A+b B is PSD. This implies that any convex combination of PSD matrices is PSD and means that the PSD vectors in R n2 of size n 2 is a convex set. Thus we may apply the Ellipsoids algorithm to check if Y 0, if we can translate this constraint into a collection of of linear constraint and then find a violated constraint.

Combinatorial Optimization 4 Posing the Y 0 as a collection of linear constrains We can give an alternative definition for Y 0 by an infinite number of linear constrains For every x R n there is the constraint x T Y x 0. Note that this is just a linear combination of the y ij variables since x is a constant vector. The number of linear constraints is infinite (this makes no difference). Finding a violated constraint is finding a negative eigenvalue λ < 0 If Y is not PSD then there is an eigenvalue λ < 0. Let x be the eigenvector Then x T A x = λ x T x < 0. This is because clearly x T x is a positive number. Thus we found a violated constraint. PSD programming can be solved in polynomial time.

Combinatorial Optimization 5 Summary We can use n 2 variables y ij and the constraint Y 0 in a program. Finding a violated constraint for Y 0 can be done in polynomial time. There is another way to look at it. If {y ij } is PSD is equivalent to the existence of a matrix D so that D T D = Y. Namely y ij = v T i v j for v i,v j R n. So, we can also look at this as vector programming. Instead of y ij we have v T i v j. v i,v j R n.

Combinatorial Optimization 6 How can we use the fact that we can handle multiplications such as v T i v j? Here is a stronger relaxation for Max-Cut. In what follows v i,v j R n. Maximize ij (1 vt i v j )/2 And y T i y i = 1 for every i. Note if we denote y ij = v T i v j, then Y 0. Let the optimum cut be C,V C. Why is the above a relaxation? Set v T i to (1,0,0,...,0) if i C. And set v T i = ( 1,0,0...,0) if i V \C.

Combinatorial Optimization 7 Why is this a relaxation continued Note that an edge ij is in the cut is adds a value of 2. Namely if v T i v j = 1 it adds 1 v T i v j = 2. If i,j are in the same side, it adds 1 v T i v j = 0. We divide by 2 to get exactly the cut value. Thus the maximum of the above program is at least the max cut. Its very important to note that we are not able to find a solution of low dimension. We can never find a vector in which only one entry is non-zero. All of this is due to Goemans and williamson. They knew the above, for a long time. But how to go from vectors to a partition? This took I think about 3 years.

Combinatorial Optimization 8 What is the problem? We are not going to get vectors that have a single entry. We can not impose such a thing (its NP-hard). The solution will be some complex collection of vectors in R n. The question is how to translate the vectors into a partition. At times, when we go from real numbers to vectors, the answer can become completely meaningless. But this is not the case in Max Cut

Combinatorial Optimization 9 Rounding Let opt v (for vectors) be the optimum for the PSD programming defined above and opt the value of the maximum cut. Thus opt v opt. Note that the vectors are unit vectors in R n. This is because of the constraint v T i v i = 1. The algorithm is: 1) Choose a random unit vector r on the unit sphere. 2) Place in S all so that {i v y r 0} and place the rest of the i in V S Remark: We later discuss in length how to choose a random vector on a sphere.

Combinatorial Optimization 10 Moving to the plane Consider just two vectors. Then we can map them to the plane because the dimension is only 2. v1 v2 ALPHA THE UNIT CIRCLE Figure 1: The angle between two vectors

Combinatorial Optimization 11 Intuition Consider a random unit vector r. Consider an edge ij. When is sign(r T v i ) different than sign(r T v j )? We can always think of the angle between v i,v j as less or equal π. When is cos(α) > 0? If α π/2. When is cos(α) < 0 If α > π/2. Consider the following figure

Combinatorial Optimization 12 Separating r vectors v1 v2 ALPHA r1 r_4 ALPHA r_3 r2 THE UNIT CIRCLE Figure 2: Separating r values cos(α) is positive until π/2 and negative from after π/2 to π.

Combinatorial Optimization 13 Explanation When r = r 1 there is π/2 degree between r and v 2. Say that we move from r 1 toward r 2 that has degree π/2 with v 1. When r is strictly between r 1 and r 2 then r v 1 > 0 and r v 2 < 0 because the degree between r and v 1 is less than π/2 but between r and v 2 is more than Π/2. Then the degree between r and both vectors is more than π/2. When r gets to r 3 the degree towards v 2 is less than π/2 and toward v 1 more than π/2 until we get to r 4. Thus there is 2α choices of r out of 2π that give different signs. We proved: The probability that i and j are separated is 2α/2π = α/π. Thus the vector program will try to choose large degree between v i and v j.

Combinatorial Optimization 14 So what would the PSD do? It increases the chance for i and j to contribute to the objective function if the degree between them is large. The PSD is stronger than the LP because it finds the best collection of vectors with respect to having large degree between v i and v j if i,j is an edge. Define S ij as 1 if ij is a cut edge and 0 otherwise. We showed that E(S i,j ) = Pr(ij is in the cut) = α ij /π. When α ij is the degree between v i,v j. Let T (for Total) be i,j S i,j Thus E(T) = ij Pr(ij is in the cut) = i,j α ij/π.

Combinatorial Optimization 15 Putting T in terms of v 1 v 2 Lemma: The probability that i and j are separated is arccos(v 1 v 2 )/π. Proof: as v i v j = cos(α i,j ) and α = arccos(cos(α)) we get that the probability for separation is arccos(v 1 v 2 )/π. The following fact can be verified by calculus. arccos(x)/π 0.878(1 x)/2

Combinatorial Optimization 16 The approximation ratio Theorem: The ratio of the algorithm is 0.878. Proof The contribution to the vector program of i,j is (1 v i v j )/2 The contribution of i and j to the expectation is: P(i and j are separated) = arccos(v 1 v 2 )/π 0.878(1 v 1 v 2 )/2. Thus the contribution of i, j to the expectation, namely the probability they are separated, is at least 0.878 times the value contribution to the PSD. The ratio of 0.878 follows. While its not immediate, the algorithm can be derandomized. The ratio is tight under the Unique Game Conjecture.

Combinatorial Optimization 17 Coloring a 3-colorable graph This is a promise problem. We cant check that a graph can be colored by 3 colors. Thus the one that produced the graph starts with some 3 independent sets as vertices and then adds edges in an arbitrary way (just not within the independent set). This problem is definitely easier than Min coloring that was shown to be n 1 ǫ inapproximable by Feige et al. We shall discuss a simple Õ( n) ratio algorithm In the next slides, with Õ() ignoring polylogarithmic functions. This already shows the problem is easier than coloring.

Combinatorial Optimization 18 A simple algorithm Recall that we can find an independent set of size n/(d+1) with d the average degree. This yields by a standard analysis an O(dlogn) approximation. But d may be large. We use the fact that 2-coloring a graph is a polynomial problem. The algorithm 1. While thee is a vertex v of degree at least n do (a) 2-color N(v) with new colors. 2. Use the O(d logn) coloring /* When we get to the above line, the maximum degree is at most n */

Combinatorial Optimization 19 Analysis The neighborhood of every 3-colorable graph is 2 colorable, and two coloring a graph is a polynomial problem. Each time we find a vertex of degree at least n we need two new colors and color n vertices with these 2 colors. Thus the number of colors used is at most 2 n/ n = 2 n. When this ends the maximum degree and thus the average degree d as well is at most n and a Õ( n) ratio follows. We proved a Õ( n) approximation ratio. Unfortunately there is no real hardness results for coloring 3-colorable graphs.

Combinatorial Optimization 20 Finding Ω(n/ 1/3 log ) maximum independent set Interestingly, we need to know the theory of the normal distribution. We shall use only normal distribution with mean 0 and variance 1. The way to think of normal distribution. Say that X i is 1 with probability 1/2 and 1 with probability 1/2. Consider n i=1 X i/ n with the number n going to infinite. Clearly the mean is 0. The standard deviation is 1. Almost all the probability will concentrated around the value X = 0. When n goes to infinity, the Gaus bell emerges. The basic normal distribution of min 0 and variance 1 is f(x) = e x2 /2 2π.

Combinatorial Optimization 21 Some facts Fact 1: The sum of two normal distributions with means µ 1 and µ 2 and variance σ 2 1 and σ 2 2 is normal with min µ 1 +µ 2 and variance σ 2 1 +σ 2 2 Let ψ(y) = y f(y)d(y). Now for another fact we will not prove. For every x > 0 f(x) (1/x 1/x 3 ) ψ(x) f(x)/x Fact 3: Say that we chose an n entries vectors so that every entry independently is chosen from a Normal distribution with mean 0 and variance 1 and divide it by n to make the variance aof the vector 1. Then this yields a random vector on the unit sphere. From now on, when we say random vector we mean the above.

Combinatorial Optimization 22 Projections Lemma: For a unit vector v and a random vector r, r T v is the basic normal distribution. This can be proved by previous facts. r T v = n i=1 v i X i with v i the i entry in v and X i the basic normal distribution (0 mean and variance 1). As we saw above the above gives mean 0 and variance n i=1 v2 i = 1 because v is a unit vector. We now define the main notion. Vector 3-coloring. This notion is different than 3 coloring. In fact can be polynomial smaller than 3.

Combinatorial Optimization 23 Vector 3 coloring Assign to every vertex a unit vector v i so that if ij is an edge then v T i v i 1/2. Such a arrangement of vectors is possible if the graph is 3-colorable. Consider the 3 independent set I 1,I 2,I 3 in the graph. Let u 1,u 2,u 3 be three vectors on the plain with degree at least 2 π/3 (120 degrees) between them. Assign u i to all I i. Clearly if ij is an edge then the value is cos(2π/3) = cos(π/3) = 1/2.

Combinatorial Optimization 24 An example of vector 3 coloring B A B C A C F D E 120 120 E F D 120 J K X X J K Figure 3: Vector 3 coloring of a graph

Combinatorial Optimization 25 How to find a legal 3 vector coloring in polynomial time? Use Positive Semi Definite programming. Minimize z Such that v T i v j z for every edge i,j. and v T i v i = 1. And we know that z 1/2. Thus we can assume that we have a collection of vectors that will have at most 2π/3 degree between them.

Combinatorial Optimization 26 An algorithm to find a large independent set Let θ be a threshold chosen later. 1. Find a vector 3 coloring 2. Choose a random unit vector r. 3. Let S = {v i r T v i θ}. 4. Let G(S) the graph induced by S. 5. As long as G(S) contains an edge e = uv, remove u from S 6. Let S be the non removed vertices 7. Return S

Combinatorial Optimization 27 Some random variables Let X = S. Let Y = E(S). Then Z = S max{0,x Y}. And so E(Z) E(X) E(Y). We now show how to bound E(X) from below and E(Y) from above. As r T v i is a basic normal distribution P(r T v θ) = ψ(θ). Thus E(X) = n ψ(θ). We now upper bound E(Y)

Combinatorial Optimization 28 Upper bounding Y E(Y) = edges i,j and r T v i θ) P(r T v i θ P(r T v i +r T v j ) 2θ. Note that the distribution of r T (v i +v j ) is not the basic normal one because v 1 +v 2 is not a unit vector However define u = (v i +v j )/ v i +v j. This is a unit vector and so r T u is the basic normal distribution.

Combinatorial Optimization 29 Analysis continued Note that v i +v j = v i 2 + v j 2 +2v T i v j. For an edge, v T i v j 1/2. Thus u 2 = v i +v j 2 1+1 1 = 1. We got that P(r T v i +r T v j 2θ) = P(r T u 2θ)/ u ψ(2 θ) Recall that m is the number of edges is the maximum degree. Thus by the linearity of expectation E(Y) m ψ(2θ) n ψ(2 θ)/2 We got E(X) E(X) E(Y) nψ(θ) n ψ(2θ)/2 We need to maximize the above over θ.

Combinatorial Optimization 30 Analysis continued Recall that we showed f(x)(1/x 1/x 3 ) ψ(x) f(x)/x. And f(x) = e x2 / 2 π. Set θ = (2 log /2). We get that n ψ(θ) n ψ(2θ)/2 = Ω(n)/( 1/3 log )

Combinatorial Optimization 31 The number of colors From the above it follows that Õ( 1/3 ) coloring It is possible to combine this with removing large degrees (the Õ( n) algorithm) and get a Õ(n 1/4 ) colors. This, albeit, is not the best ratio. An approximation of O(n 0.2111 ) is the best(?) known, using the alogorithm or ARV for sparsest cut. It may be possible that Lift and project techniques will give polylog ratio in time quasi polynomial in n (as far as I know its not known yet). Using vector coloring can not get polylog ratio. There are 3-colorable graphs whose minimum vertex coloring is n 0.05. The SDP does not catches the problem.