Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010
Semidefinite programming duality Complexity Positive semidefinite matrices Definition: For a symmetric n n matrix X, the following conditions are equivalent: 1. X is positive semidefinite (written X 0) if all eigenvalues of X are nonnegative. 2. u T Xu 0 for all u R n. 3. X = UU T for some matrix U R n p. 4. For some vectors v 1,..., v n R p, X ij = vi T v j ( i, j [n]). Say that X is the Gram matrix of the v i s. 5. All principal minors of X are nonnegative. Definition: X is positive definite (written X 0) if all eigenvalues of X are positive.
Semidefinite programming duality Complexity Notation S n : the space of n n symmetric matrices. S n + : the cone of positive semidefinite matrices. S n ++ : the cone of positive definite matrices. S n ++ is the interior of the cone S n +. Trace inner product on S n : A B = Tr(A T B) = n A ij B ij i,j=1 The PSD cone is self-dual: For X S n, A S + n A B 0 B S + n
Semidefinite programming duality Complexity Primal/dual semidefinite programs Given matrices C, A 1,..., A m S n and a vector b R m Primal SDP: p := max X C X such that A j X = b j (j = 1,..., m), X 0 Dual SDP: d := min y b y such that m j=1 y ja j C 0 Weak duality: p d Pf: If X is primal feasible and y is dual feasible, then ( m ) 0 y j A j C X = j=1 j y j (A j X ) C X = b y C X
Semidefinite programming duality Complexity Analogy between LP and SDP Given vectors c, a 1,..., a m R n and b R m Primal/dual LP: max x c x such that a j x = b j ( j m), x R n + min y b y such that m y j a j c 0 j=1 Primal SDP: max X C X such that A j X = b j (j = 1,..., m), X S + n SDP is the analogue of LP, replacing R n + by S + n. Get LP when C, A j are diagonal matrices.
Semidefinite programming duality Complexity Strong duality: p = d? Strong duality holds for LP, but we need some regularity condition (e.g., Slater condition) to have strong duality for SDP! Primal (P) / Dual (D) SDP s: p d (P) p = sup C X s.t. A j X = b j (j = 1,..., m), X 0 (D) d = inf b y s.t. m j=1 y ja j C 0 Strong duality Theorem: 1. If (P) is strictly feasible ( X 0 feasible for (P)) and bounded (p < ), then p = d and (D) attains its infimum. 2. If (D) is strictly feasible ( y with j y ja j C 0) and bounded (d > ), then p = d and (P) attains its supremum.
Semidefinite programming duality Complexity Proof of 2. Assume d R and j ỹja j C 0 ỹ p = max X 0? C X d = inf y b y A j X = b j j y ja j C 0 Goal: There exists X feasible for (P) with C X d. WMA b 0 (else, b = 0 implies d = 0 and choose X = 0). Set { M := y j A j C y R m, b y d }. Fact: M S ++ n =. j Pf: Otherwise, let y for which b y d and j y ja j C 0. Then one can find y with b y < b y d and j y j A j C 0. This contradicts the minimality of d.
Semidefinite programming duality Complexity Sketch of proof for 2. (continued) As M S ++ n =, there is a hyperplane separating M and S ++ n. That is, there exists Z 0 non-zero with Z Y 0 Y M, i.e., b y d = Z ( j y j A j C) 0 By Farkas lemma, there exists µ R + for which (Z A j ) j = µb and µd Z C If µ = 0, then 0 }{{} Z ( ỹ j A j C) > 0, a contradiction. 0 j }{{} 0 Hence µ > 0 and Z/µ is feasible for (P) with C (Z/µ) d. QED.
Semidefinite programming duality Complexity An example with duality gap 0 x 12 0 p = min x 12 s.t. x 12 x 22 0 0 0 0 1 + x 12 = min 1 2 E 12 X s.t. E 11 X = 0 a E 13 X = 0 b E 23 X = 0 c (E 33 1 2 E 12) X = 1 y X 0 d = max y s.t. 1 2 E 12 ae 11 be 13 ce 23 y(e 33 1 2 E 12) 0 y+1 a 2 b = max y s.t. y+1 2 0 c 0 b c y Thus, p = 0, d = 1 non-zero duality gap
Semidefinite programming duality Complexity Complexity Recall: An LP with rational data has a rational optimum solution whose bit size is polynomially bounded in terms of the bit length of the input data. Not true for SDP: 2 = max x s.t. Any solution to ( ) x1 2 0 0, 0 1 satisfies x 1 2 2n 1. ( ) 1 x 0 x 2 ( ) ( ) x2 x 1 xn x 0,..., n 1 0 x 1 1 x n 1 1
Semidefinite programming duality Complexity Complexity (continued) Theorem: SDP can be solved in polynomial time to an arbitrary prescribed precision. [Assuming certain technical conditions hold.] Theoretically: Use the ellipsoid method [since checking whether X 0 is in P, e.g. with Gaussian elimination] Practically: Use e.g. interior-point algorithms. More precisely: Let K denote the feasible region of the SDP. Assume we know R N s.t. X K with X R if K. Given ɛ > 0, the ellipsoid based algorithm, either finds X at distance at most ɛ from K such that C X C X ɛ X K at distance at least ɛ from the border, or claims: there is no such X. The rumnning time is polynomial in n, m, the bit size of A j, C, b, log R, and log(1/ɛ).
Semidefinite programming duality Complexity Feasibility of SDP Feasibility SDP problem (F): Given integer A 0, A j S n, decide whether there exists x R m s.t. A 0 + m j=1 x ja j 0? (F) NP (F) co-np. [Ramana 97] (F) P for fixed n or m. [Porkolab-Khachiyan 97] Testing existence of a rational solution is in P for fixed dimension m. [Porkolab-Khachiyan 97] More on complexity and algorithms for SDP in other lectures.
Semidefinite programming duality Complexity Use SDP to express convex quadratic constraints Consider the convex quadratic constraint: x T Ax b T x + c where A 0. Write A = B T B for some B R p n. ( ) Then: x T Ax b T Ip Bx x + c x T B T b T 0 x + c Use Schur complement: Given C 0, ( ) C B B T 0 A B T C 1 B 0 A
Semidefinite programming duality Complexity The S-lemma [Yakubovich 1971] Consider the quadratic polynomials: ( ) ( ) f (x) = x T Ax + 2a T x + α = (1 x T α a T 1 ) a A x ( ) ( ) g(x) = x T Bx + 2b T x + β = (1 x T β b T 1 ) b B x Question: Characterize when (*) f (x) 0 = g(x) 0 Answer: Assume f (x) > 0 for some x. Then, (*) holds IFF ( ) ( ) β b T α a T λ 0 for some λ 0 b B a A
Semidefinite programming duality Complexity Testing sums of squares of polynomials Question: How to check whether a polynomial p(x) = α N n α 2d p αx α can be written as a sum of squares: p(x)? = m j=1 (u j(x)) 2 for some polynomials u j? Answer: Use semidefinite programming: x α = x α 1 1 x n αn Write u j (x) = (a j ) T [x] d [x] d = (x α ) α d j (u j(x)) 2 = ([x] d ) T ( a j aj T ) [x] d j }{{} A 0 Test feasibility of SDP: A β,γ = p α ( α 2d), A 0 β,γ: β, γ d β+γ=α
Two milestone applications of SDP to combinatorial optimization Approximate maximum stable sets and minimum vertex coloring with the theta number. Work of Lovász [1979], Grötschel-Lovász-Schrijver [1981] (First non-trivial) 0.878-approximation algorithm for max-cut of Goemans-Williamson [1995]
G = (V, E) graph. S V stable set if S contains no edge. α(g):= maximum size of a stable set stability number χ(g):= minimum number of colors needed to color the vertices so that adjacent vertices receive distinct colors. Computing α(g), χ(g) is an NP-hard problem. of Lovász [1979]: ϑ(g) := max J X s.t. Tr(X ) = 1, X ij = 0 (ij E), X 0 Lovász sandwich theorem: α(g) ϑ(g) χ(ḡ). Can compute α(g), χ(ḡ) via SDP for graphs with α(g) = χ(ḡ).
Maximum cuts in graphs G = (V, E), n = V, w = (w e ) e E edge weights. S V cut δ(s) := all edges cut by the partition (S, V \ S). Max-Cut problem: Find a cut of maximum weight. Max-Cut is NP-hard. No (16/17 + ɛ)-approximation algorithm unless P=NP. mc(g) Max-Cut is in P for graphs with no K 5 minor, since it can be computed with the LP: [Barahona-Mahjoub 86] max w T x s.t. x ij x ik x jk 0, x ij +x ik +x jk 2 i, j, k V An easy 1/2-approximation algorithm for w 0: Consider the random partition (S, V \ S), where i S with prob. 1/2 : E(w(S)) = w(e)/2 mc(g)/2
Goemans-Williamson approximation algorithm for Max-Cut Encode a partition (S, V \ S) by a vector x {±1} n. Encode the cut δ(s) by the matrix X = xx T. Reformulate Max-Cut: max 1 2 ij E w ij(1 x i x j ) s.t. x {±1} n Solve the SDP relaxation: max 1 2 ij E w ij(1 X ij ) s.t. X 0, diag(x ) = e v 1,..., v n unit vectors s.t. X = (vi T v j ) is opt. for SDP. Randomized rounding: Pick a random hyperplane H with normal r. partition (S, V \ S) depending on the sign of v T i r.
Performance analysis Theorem: For w 0, mc(g) E(w(δ(S))) 0.878 sdp(g) 0.878 mc(g). }{{} Basic lemma: Prob(ij δ(s)) = arccos(v T i v j ) π. E(w(δ(S))) = ij E w ij Prob(ij δ(s)) = ij E w ij arccos(vi T v j ) π = ij E w ij (1 v T i v j ) 2 sdp(g) α GW 2 arccos(vi T v j ) π } 1 vi T v j {{ } α GW 0.878
Extension to ±1 quadratic programming Given A S n Integer problem: ip(a) := max x T Ax s.t. x {±1} n SDP relaxation: sdp(a) := max A X s.t. X 0, diag(x ) = e. A = 1 4 L w, L w : Laplacian matrix of (G, w) where L w (i, i) = w(δ(i)), L w (i, j) = w ij. Max-Cut When A 0, Ae = e, A ij 0 (i j) 0.878-approx. alg. When A 0 2 π ( 0.636)-approx. alg. [Nesterov 97] When diag(a) = 0 Grothendieck constant
Nesterov 2 π-approximation algorithm Solve SDP: Let v 1,..., v n unit vectors s.t. X = (v T i v j ) maximizes A X. Random hyperplane rounding: Pick a random unit vector r. random ±1 vector: x = (sgn(r T v i )) n i=1 Lemma 1 [identity of Grothendieck] E(xx T ) = 2 π arcsin X. Proof: E(sgn(r T v i ) sgn(r T v j )) = 1 2 Prob(sgn(r T v i ) sgn(r T v j )) = 1 2 arccos(v i T v j ) π = 2 π ( π 2 arccos(v i T v j )) = 2 π arcsin(v T i v j ).
Global performance analysis Lemma 2: arcsin X X 0 Proof: arcsin x x = k a kx 2k+1 where a k 0. Global analysis: E(x T Ax) = A E(xx T ) Therefore: For A 0, = A (E(xx T ) 2 π X ) + 2 π A X = }{{} A ( 2 π arcsin X 2 π X ) + 2 π A X 0 }{{} 0 }{{} 0 2 π A X. ip(a) 2 π sdp(a).
Grothendieck inequality Assume diag(a) = 0. The support graph G A has as edges the pairs ij with A ij 0. Definition: The Grothendieck constant K(G) of a graph G is the smallest constant K for which sdp(a) K ip(a) for all A S n with G A G. Theorem: ([Gr. 53] [Krivine 77] [Alon-Makarychev(x2)-Naor 05]) For G complete bipartite, π 2 K(G) π 1 2 ln(1+ 1.782 2) Ω(log(ω(G))) K(G) O(log(ϑ(Ḡ))).
Sketch of proof for Krivine s upper bound: Show: K(K n,m ) π 2 π 1 2 ln(1+ 2) 1 ln(1+ 2) =: π 2c? c := ln(1 + 2) 1. Let A R n m. Let u i (i n) and v j (j m) be unit vectors in H maximizing sdp(a) = i n, j m a iju i v j. 2. Construct new unit vectors S(u i ), T (v j ) Ĥ satisfying arcsin(s(u i ) T (v j )) = c u i v j 3. Pick a random unit vector r Ĥ. Define the ±1 vectors x, y x i = sgn(r T S(u i )), y j = sgn(r T T (v j )) 4. Analysis: E(x T Ay) = i,j a ij E(x i y j ) = i,j a ij 2 π arcsin(s(u i) T (v j )) = i,j a ij 2 π c u i v j = 2 π c sdp(a).
Proof (continued) Step 2. Given unit vectors u, v H, construct S(u), T (v) Ĥ satisfying arcsin(s(u) T (v)) = c u v. sin x = k 0 ( 1)k x2k+1 (2k+1)!, sinh x = k 0 x2k+1 (2k+1)! Set c := sinh 1 (1) = ln(1 + 2). ( c Set S(u) = 2k+1 (2k+1)! )k u (2k+1) Ĥ := k 0 H (2k+1), T (v) = (( 1) k c 2k+1 (2k+1)! )k v (2k+1) Ĥ. Then, S(u) T (v) = k ( 1)k c2k+1 (2k+1)! (u v)2k+1 = sin(c u v). Thus: arcsin(s(u) T (v)) = c u v.
A reformulation of the theta number Theorem [Alon-Makarychev(x2)-Naor 05] The smallest constant C for which sdp( A) C sdp(a) for all A S n with G A G is C = ϑ(ḡ) 1. Geometrically: E n := {X S n X 0, diag(x ) = e} the elliptope E(G) R E : the projection of E n onto the edge set of G. Theorem [AMMN]: E(G) (ϑ(ḡ) 1) E(G).
Link with matrix completion : Given a partial n n matrix, whose entries are specified on the diagonal (say equal to 1) and on a subset E of the positions (given by x R E ), decide whether it can be completed to a PSD matrix. Equivalently, decide whether x E(G)? A necessary condition: Each fully specified principal submatrix is PSD. [Clique condition] The clique condition is sufficient IFF G is a chordal graph (i.e. no induced circuit of length 4). 1 1 a? 1 1 1 1 b? a? 1 1 1 1 b? 1 1 is not completable to PSD
Another necessary condition Fact: a + b + c 2π 1 cos a cos b cos a 1 cos c a b c 0 0 a + b c 0 cos b cos c 1 a b + c 0 Write x = cos a for some a [0, π] E. If x E(G) then [Metric condition] a(f ) a(c \ F ) π( F 1) C circuit, F C odd. The metric condition is sufficient IFF G has no K 4 minor. 1 1/2 1/2 1/2 1/2 1 1/2 1/2 1/2 1/2 1 1/2 0 1/2 1/2 1/2 1 while 2π 3 (1, 1, 1, 1) satisfies the triangle inequalities.
Geometrically Introduction CUT ±1 (G) E(G), with equality IFF G has no K 3 minor. E(G) cos(π MET 01 (G)), with equality IFF G has no K 4 minor. E(G) cos(π CUT 01 (G)), with equality IFF G has no K 4 minor. The Goemans-Williamson randomized rounding argument shows: If v 1,..., v n are unit vectors and a ij := arccos(vi T v j ) are their pairwise angles, then c ij a ij π c 0 1 i<j n if c z c 0 is any inequality valid for the cuts of K n.
Extension to max k-cut [Frieze Jerrum 95] Max k-cut: Given G = (V, E), w R E +, k 2, find a partition P = (S 1,..., S k ) maximizing w(p) = e E e is cut by P w e. Pick unit vectors a 1,..., a k R k with a T i a j = 1 k 1 Model max k-cut: SDP relax.: mc k (G) = ij E w ij(1 xi T x j ) s.t. x 1,..., x n {a 1,..., a k }. max k 1 k sdp k (G) = max k 1 k ij E w ij(1 vi T v j ) s.t. v i unit vectors, vi T v j 1 k 1. for i j. Randomized rounding: Pick k independent random unit vectors r 1,..., r k partition P = (S 1,..., S k ) where S h = {i vi T r h vi T r h h }.
Analysis Introduction The probability that edge ij is not cut, i.e., v i, v j are both closer to the same r h, is equal to k times a function f (v T i v j ). E(w(P)) = ij E w ij(1 kf (vi T v j )) = ij E w 1 kf (vi T v j ) k ij 1 vi T v j k 1 }{{} α k sdp k (G). 1 kf (t) k α k :=min 1 k 1 t 1 1 t k 1 k 1 k (1 v i T v j ) α 2 = α GW 0.878: GW approximation ratio for max-cut. α 3 = 7 12 + 3 4π 2 arccos 2 ( 1/4) > 0.836 [de Klerk et al.] α 100 > 0.99.