Introduction to Semidefinite Programming I: Basic properties a

Similar documents
SDP Relaxations for MAXCUT

LNMB PhD Course. Networks and Semidefinite Programming 2012/2013

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

8 Approximation Algorithms and Max-Cut

Copositive Programming and Combinatorial Optimization

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

Semidefinite Programming

Lift-and-Project Techniques and SDP Hierarchies

Modeling with semidefinite and copositive matrices

On the Sandwich Theorem and a approximation algorithm for MAX CUT

Lecture: Examples of LP, SOCP and SDP

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

Four new upper bounds for the stability number of a graph

Summer School: Semidefinite Optimization

Complexity of the positive semidefinite matrix completion problem with a rank constraint

Convex and Semidefinite Programming for Approximation

Grothendieck s Inequality

Lecture Semidefinite Programming and Graph Partitioning

Semidefinite Programming Basics and Applications

Copositive Programming and Combinatorial Optimization

Approximation Algorithms

Lecture 10. Semidefinite Programs and the Max-Cut Problem Max Cut

Semidefinite programs and combinatorial optimization

Applications of the Inverse Theta Number in Stable Set Problems

6.854J / J Advanced Algorithms Fall 2008

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Mustapha Ç. Pinar 1. Communicated by Jean Abadie

Lecture 3: Semidefinite Programming

Chapter 3. Some Applications. 3.1 The Cone of Positive Semidefinite Matrices

4. Algebra and Duality

Lecture Note 5: Semidefinite Programming for Stability Analysis

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Conic approach to quantum graph parameters using linear optimization over the completely positive semidefinite cone

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

SEMIDEFINITE PROGRAM BASICS. Contents

A new graph parameter related to bounded rank positive semidefinite matrix completions

Solving large Semidefinite Programs - Part 1 and 2

c 2000 Society for Industrial and Applied Mathematics

Lecture 17 (Nov 3, 2011 ): Approximation via rounding SDP: Max-Cut

A notion of Total Dual Integrality for Convex, Semidefinite and Extended Formulations

Linear and non-linear programming

Lecture 8: The Goemans-Williamson MAXCUT algorithm

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

JOHN THICKSTUN. p x. n sup Ipp y n x np x nq. By the memoryless and stationary conditions respectively, this reduces to just 1 yi x i.

LINEAR PROGRAMMING III

Semidefinite Programming and Integer Programming

1 The independent set problem

Lecture 4: January 26

Distance Geometry-Matrix Completion

Conic Linear Optimization and its Dual. yyye

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Semidefinite Programming (SDP)

Network Optimization

Convex Optimization M2

CO 250 Final Exam Guide

Lecture 8. Strong Duality Results. September 22, 2008

Geometric Ramifications of the Lovász Theta Function and Their Interplay with Duality

Relaxations of combinatorial problems via association schemes

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

Relaxations and Randomized Methods for Nonconvex QCQPs

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

Semidefinite Programming

Lecture: Introduction to LP, SDP and SOCP

III. Applications in convex optimization

BBM402-Lecture 20: LP Duality

LECTURE 10 LECTURE OUTLINE

EE 227A: Convex Optimization and Applications October 14, 2008

Lecture #21. c T x Ax b. maximize subject to

Lecture 7: Positive Semidefinite Matrices

Lecture 8 : Eigenvalues and Eigenvectors

4TE3/6TE3. Algorithms for. Continuous Optimization

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko

Hierarchies. 1. Lovasz-Schrijver (LS), LS+ 2. Sherali Adams 3. Lasserre 4. Mixed Hierarchy (recently used) Idea: P = conv(subset S of 0,1 n )

WHEN DOES THE POSITIVE SEMIDEFINITENESS CONSTRAINT HELP IN LIFTING PROCEDURES?

Lecture 6: Conic Optimization September 8

Integer Linear Programs

Linear algebra for computational statistics

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Relations between Semidefinite, Copositive, Semi-infinite and Integer Programming

Lectures 6, 7 and part of 8

There are several approaches to solve UBQP, we will now briefly discuss some of them:

The moment-lp and moment-sos approaches

1 Positive definiteness and semidefiniteness

Operations Research. Report Applications of the inverse theta number in stable set problems. February 2011

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Applications of semidefinite programming, symmetry and algebra to graph partitioning problems

IE 521 Convex Optimization

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Semidefinite Programming, Combinatorial Optimization and Real Algebraic Geometry

1 Introduction Semidenite programming (SDP) has been an active research area following the seminal work of Nesterov and Nemirovski [9] see also Alizad

Transcription:

Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010

Semidefinite programming duality Complexity Positive semidefinite matrices Definition: For a symmetric n n matrix X, the following conditions are equivalent: 1. X is positive semidefinite (written X 0) if all eigenvalues of X are nonnegative. 2. u T Xu 0 for all u R n. 3. X = UU T for some matrix U R n p. 4. For some vectors v 1,..., v n R p, X ij = vi T v j ( i, j [n]). Say that X is the Gram matrix of the v i s. 5. All principal minors of X are nonnegative. Definition: X is positive definite (written X 0) if all eigenvalues of X are positive.

Semidefinite programming duality Complexity Notation S n : the space of n n symmetric matrices. S n + : the cone of positive semidefinite matrices. S n ++ : the cone of positive definite matrices. S n ++ is the interior of the cone S n +. Trace inner product on S n : A B = Tr(A T B) = n A ij B ij i,j=1 The PSD cone is self-dual: For X S n, A S + n A B 0 B S + n

Semidefinite programming duality Complexity Primal/dual semidefinite programs Given matrices C, A 1,..., A m S n and a vector b R m Primal SDP: p := max X C X such that A j X = b j (j = 1,..., m), X 0 Dual SDP: d := min y b y such that m j=1 y ja j C 0 Weak duality: p d Pf: If X is primal feasible and y is dual feasible, then ( m ) 0 y j A j C X = j=1 j y j (A j X ) C X = b y C X

Semidefinite programming duality Complexity Analogy between LP and SDP Given vectors c, a 1,..., a m R n and b R m Primal/dual LP: max x c x such that a j x = b j ( j m), x R n + min y b y such that m y j a j c 0 j=1 Primal SDP: max X C X such that A j X = b j (j = 1,..., m), X S + n SDP is the analogue of LP, replacing R n + by S + n. Get LP when C, A j are diagonal matrices.

Semidefinite programming duality Complexity Strong duality: p = d? Strong duality holds for LP, but we need some regularity condition (e.g., Slater condition) to have strong duality for SDP! Primal (P) / Dual (D) SDP s: p d (P) p = sup C X s.t. A j X = b j (j = 1,..., m), X 0 (D) d = inf b y s.t. m j=1 y ja j C 0 Strong duality Theorem: 1. If (P) is strictly feasible ( X 0 feasible for (P)) and bounded (p < ), then p = d and (D) attains its infimum. 2. If (D) is strictly feasible ( y with j y ja j C 0) and bounded (d > ), then p = d and (P) attains its supremum.

Semidefinite programming duality Complexity Proof of 2. Assume d R and j ỹja j C 0 ỹ p = max X 0? C X d = inf y b y A j X = b j j y ja j C 0 Goal: There exists X feasible for (P) with C X d. WMA b 0 (else, b = 0 implies d = 0 and choose X = 0). Set { M := y j A j C y R m, b y d }. Fact: M S ++ n =. j Pf: Otherwise, let y for which b y d and j y ja j C 0. Then one can find y with b y < b y d and j y j A j C 0. This contradicts the minimality of d.

Semidefinite programming duality Complexity Sketch of proof for 2. (continued) As M S ++ n =, there is a hyperplane separating M and S ++ n. That is, there exists Z 0 non-zero with Z Y 0 Y M, i.e., b y d = Z ( j y j A j C) 0 By Farkas lemma, there exists µ R + for which (Z A j ) j = µb and µd Z C If µ = 0, then 0 }{{} Z ( ỹ j A j C) > 0, a contradiction. 0 j }{{} 0 Hence µ > 0 and Z/µ is feasible for (P) with C (Z/µ) d. QED.

Semidefinite programming duality Complexity An example with duality gap 0 x 12 0 p = min x 12 s.t. x 12 x 22 0 0 0 0 1 + x 12 = min 1 2 E 12 X s.t. E 11 X = 0 a E 13 X = 0 b E 23 X = 0 c (E 33 1 2 E 12) X = 1 y X 0 d = max y s.t. 1 2 E 12 ae 11 be 13 ce 23 y(e 33 1 2 E 12) 0 y+1 a 2 b = max y s.t. y+1 2 0 c 0 b c y Thus, p = 0, d = 1 non-zero duality gap

Semidefinite programming duality Complexity Complexity Recall: An LP with rational data has a rational optimum solution whose bit size is polynomially bounded in terms of the bit length of the input data. Not true for SDP: 2 = max x s.t. Any solution to ( ) x1 2 0 0, 0 1 satisfies x 1 2 2n 1. ( ) 1 x 0 x 2 ( ) ( ) x2 x 1 xn x 0,..., n 1 0 x 1 1 x n 1 1

Semidefinite programming duality Complexity Complexity (continued) Theorem: SDP can be solved in polynomial time to an arbitrary prescribed precision. [Assuming certain technical conditions hold.] Theoretically: Use the ellipsoid method [since checking whether X 0 is in P, e.g. with Gaussian elimination] Practically: Use e.g. interior-point algorithms. More precisely: Let K denote the feasible region of the SDP. Assume we know R N s.t. X K with X R if K. Given ɛ > 0, the ellipsoid based algorithm, either finds X at distance at most ɛ from K such that C X C X ɛ X K at distance at least ɛ from the border, or claims: there is no such X. The rumnning time is polynomial in n, m, the bit size of A j, C, b, log R, and log(1/ɛ).

Semidefinite programming duality Complexity Feasibility of SDP Feasibility SDP problem (F): Given integer A 0, A j S n, decide whether there exists x R m s.t. A 0 + m j=1 x ja j 0? (F) NP (F) co-np. [Ramana 97] (F) P for fixed n or m. [Porkolab-Khachiyan 97] Testing existence of a rational solution is in P for fixed dimension m. [Porkolab-Khachiyan 97] More on complexity and algorithms for SDP in other lectures.

Semidefinite programming duality Complexity Use SDP to express convex quadratic constraints Consider the convex quadratic constraint: x T Ax b T x + c where A 0. Write A = B T B for some B R p n. ( ) Then: x T Ax b T Ip Bx x + c x T B T b T 0 x + c Use Schur complement: Given C 0, ( ) C B B T 0 A B T C 1 B 0 A

Semidefinite programming duality Complexity The S-lemma [Yakubovich 1971] Consider the quadratic polynomials: ( ) ( ) f (x) = x T Ax + 2a T x + α = (1 x T α a T 1 ) a A x ( ) ( ) g(x) = x T Bx + 2b T x + β = (1 x T β b T 1 ) b B x Question: Characterize when (*) f (x) 0 = g(x) 0 Answer: Assume f (x) > 0 for some x. Then, (*) holds IFF ( ) ( ) β b T α a T λ 0 for some λ 0 b B a A

Semidefinite programming duality Complexity Testing sums of squares of polynomials Question: How to check whether a polynomial p(x) = α N n α 2d p αx α can be written as a sum of squares: p(x)? = m j=1 (u j(x)) 2 for some polynomials u j? Answer: Use semidefinite programming: x α = x α 1 1 x n αn Write u j (x) = (a j ) T [x] d [x] d = (x α ) α d j (u j(x)) 2 = ([x] d ) T ( a j aj T ) [x] d j }{{} A 0 Test feasibility of SDP: A β,γ = p α ( α 2d), A 0 β,γ: β, γ d β+γ=α

Two milestone applications of SDP to combinatorial optimization Approximate maximum stable sets and minimum vertex coloring with the theta number. Work of Lovász [1979], Grötschel-Lovász-Schrijver [1981] (First non-trivial) 0.878-approximation algorithm for max-cut of Goemans-Williamson [1995]

G = (V, E) graph. S V stable set if S contains no edge. α(g):= maximum size of a stable set stability number χ(g):= minimum number of colors needed to color the vertices so that adjacent vertices receive distinct colors. Computing α(g), χ(g) is an NP-hard problem. of Lovász [1979]: ϑ(g) := max J X s.t. Tr(X ) = 1, X ij = 0 (ij E), X 0 Lovász sandwich theorem: α(g) ϑ(g) χ(ḡ). Can compute α(g), χ(ḡ) via SDP for graphs with α(g) = χ(ḡ).

Maximum cuts in graphs G = (V, E), n = V, w = (w e ) e E edge weights. S V cut δ(s) := all edges cut by the partition (S, V \ S). Max-Cut problem: Find a cut of maximum weight. Max-Cut is NP-hard. No (16/17 + ɛ)-approximation algorithm unless P=NP. mc(g) Max-Cut is in P for graphs with no K 5 minor, since it can be computed with the LP: [Barahona-Mahjoub 86] max w T x s.t. x ij x ik x jk 0, x ij +x ik +x jk 2 i, j, k V An easy 1/2-approximation algorithm for w 0: Consider the random partition (S, V \ S), where i S with prob. 1/2 : E(w(S)) = w(e)/2 mc(g)/2

Goemans-Williamson approximation algorithm for Max-Cut Encode a partition (S, V \ S) by a vector x {±1} n. Encode the cut δ(s) by the matrix X = xx T. Reformulate Max-Cut: max 1 2 ij E w ij(1 x i x j ) s.t. x {±1} n Solve the SDP relaxation: max 1 2 ij E w ij(1 X ij ) s.t. X 0, diag(x ) = e v 1,..., v n unit vectors s.t. X = (vi T v j ) is opt. for SDP. Randomized rounding: Pick a random hyperplane H with normal r. partition (S, V \ S) depending on the sign of v T i r.

Performance analysis Theorem: For w 0, mc(g) E(w(δ(S))) 0.878 sdp(g) 0.878 mc(g). }{{} Basic lemma: Prob(ij δ(s)) = arccos(v T i v j ) π. E(w(δ(S))) = ij E w ij Prob(ij δ(s)) = ij E w ij arccos(vi T v j ) π = ij E w ij (1 v T i v j ) 2 sdp(g) α GW 2 arccos(vi T v j ) π } 1 vi T v j {{ } α GW 0.878

Extension to ±1 quadratic programming Given A S n Integer problem: ip(a) := max x T Ax s.t. x {±1} n SDP relaxation: sdp(a) := max A X s.t. X 0, diag(x ) = e. A = 1 4 L w, L w : Laplacian matrix of (G, w) where L w (i, i) = w(δ(i)), L w (i, j) = w ij. Max-Cut When A 0, Ae = e, A ij 0 (i j) 0.878-approx. alg. When A 0 2 π ( 0.636)-approx. alg. [Nesterov 97] When diag(a) = 0 Grothendieck constant

Nesterov 2 π-approximation algorithm Solve SDP: Let v 1,..., v n unit vectors s.t. X = (v T i v j ) maximizes A X. Random hyperplane rounding: Pick a random unit vector r. random ±1 vector: x = (sgn(r T v i )) n i=1 Lemma 1 [identity of Grothendieck] E(xx T ) = 2 π arcsin X. Proof: E(sgn(r T v i ) sgn(r T v j )) = 1 2 Prob(sgn(r T v i ) sgn(r T v j )) = 1 2 arccos(v i T v j ) π = 2 π ( π 2 arccos(v i T v j )) = 2 π arcsin(v T i v j ).

Global performance analysis Lemma 2: arcsin X X 0 Proof: arcsin x x = k a kx 2k+1 where a k 0. Global analysis: E(x T Ax) = A E(xx T ) Therefore: For A 0, = A (E(xx T ) 2 π X ) + 2 π A X = }{{} A ( 2 π arcsin X 2 π X ) + 2 π A X 0 }{{} 0 }{{} 0 2 π A X. ip(a) 2 π sdp(a).

Grothendieck inequality Assume diag(a) = 0. The support graph G A has as edges the pairs ij with A ij 0. Definition: The Grothendieck constant K(G) of a graph G is the smallest constant K for which sdp(a) K ip(a) for all A S n with G A G. Theorem: ([Gr. 53] [Krivine 77] [Alon-Makarychev(x2)-Naor 05]) For G complete bipartite, π 2 K(G) π 1 2 ln(1+ 1.782 2) Ω(log(ω(G))) K(G) O(log(ϑ(Ḡ))).

Sketch of proof for Krivine s upper bound: Show: K(K n,m ) π 2 π 1 2 ln(1+ 2) 1 ln(1+ 2) =: π 2c? c := ln(1 + 2) 1. Let A R n m. Let u i (i n) and v j (j m) be unit vectors in H maximizing sdp(a) = i n, j m a iju i v j. 2. Construct new unit vectors S(u i ), T (v j ) Ĥ satisfying arcsin(s(u i ) T (v j )) = c u i v j 3. Pick a random unit vector r Ĥ. Define the ±1 vectors x, y x i = sgn(r T S(u i )), y j = sgn(r T T (v j )) 4. Analysis: E(x T Ay) = i,j a ij E(x i y j ) = i,j a ij 2 π arcsin(s(u i) T (v j )) = i,j a ij 2 π c u i v j = 2 π c sdp(a).

Proof (continued) Step 2. Given unit vectors u, v H, construct S(u), T (v) Ĥ satisfying arcsin(s(u) T (v)) = c u v. sin x = k 0 ( 1)k x2k+1 (2k+1)!, sinh x = k 0 x2k+1 (2k+1)! Set c := sinh 1 (1) = ln(1 + 2). ( c Set S(u) = 2k+1 (2k+1)! )k u (2k+1) Ĥ := k 0 H (2k+1), T (v) = (( 1) k c 2k+1 (2k+1)! )k v (2k+1) Ĥ. Then, S(u) T (v) = k ( 1)k c2k+1 (2k+1)! (u v)2k+1 = sin(c u v). Thus: arcsin(s(u) T (v)) = c u v.

A reformulation of the theta number Theorem [Alon-Makarychev(x2)-Naor 05] The smallest constant C for which sdp( A) C sdp(a) for all A S n with G A G is C = ϑ(ḡ) 1. Geometrically: E n := {X S n X 0, diag(x ) = e} the elliptope E(G) R E : the projection of E n onto the edge set of G. Theorem [AMMN]: E(G) (ϑ(ḡ) 1) E(G).

Link with matrix completion : Given a partial n n matrix, whose entries are specified on the diagonal (say equal to 1) and on a subset E of the positions (given by x R E ), decide whether it can be completed to a PSD matrix. Equivalently, decide whether x E(G)? A necessary condition: Each fully specified principal submatrix is PSD. [Clique condition] The clique condition is sufficient IFF G is a chordal graph (i.e. no induced circuit of length 4). 1 1 a? 1 1 1 1 b? a? 1 1 1 1 b? 1 1 is not completable to PSD

Another necessary condition Fact: a + b + c 2π 1 cos a cos b cos a 1 cos c a b c 0 0 a + b c 0 cos b cos c 1 a b + c 0 Write x = cos a for some a [0, π] E. If x E(G) then [Metric condition] a(f ) a(c \ F ) π( F 1) C circuit, F C odd. The metric condition is sufficient IFF G has no K 4 minor. 1 1/2 1/2 1/2 1/2 1 1/2 1/2 1/2 1/2 1 1/2 0 1/2 1/2 1/2 1 while 2π 3 (1, 1, 1, 1) satisfies the triangle inequalities.

Geometrically Introduction CUT ±1 (G) E(G), with equality IFF G has no K 3 minor. E(G) cos(π MET 01 (G)), with equality IFF G has no K 4 minor. E(G) cos(π CUT 01 (G)), with equality IFF G has no K 4 minor. The Goemans-Williamson randomized rounding argument shows: If v 1,..., v n are unit vectors and a ij := arccos(vi T v j ) are their pairwise angles, then c ij a ij π c 0 1 i<j n if c z c 0 is any inequality valid for the cuts of K n.

Extension to max k-cut [Frieze Jerrum 95] Max k-cut: Given G = (V, E), w R E +, k 2, find a partition P = (S 1,..., S k ) maximizing w(p) = e E e is cut by P w e. Pick unit vectors a 1,..., a k R k with a T i a j = 1 k 1 Model max k-cut: SDP relax.: mc k (G) = ij E w ij(1 xi T x j ) s.t. x 1,..., x n {a 1,..., a k }. max k 1 k sdp k (G) = max k 1 k ij E w ij(1 vi T v j ) s.t. v i unit vectors, vi T v j 1 k 1. for i j. Randomized rounding: Pick k independent random unit vectors r 1,..., r k partition P = (S 1,..., S k ) where S h = {i vi T r h vi T r h h }.

Analysis Introduction The probability that edge ij is not cut, i.e., v i, v j are both closer to the same r h, is equal to k times a function f (v T i v j ). E(w(P)) = ij E w ij(1 kf (vi T v j )) = ij E w 1 kf (vi T v j ) k ij 1 vi T v j k 1 }{{} α k sdp k (G). 1 kf (t) k α k :=min 1 k 1 t 1 1 t k 1 k 1 k (1 v i T v j ) α 2 = α GW 0.878: GW approximation ratio for max-cut. α 3 = 7 12 + 3 4π 2 arccos 2 ( 1/4) > 0.836 [de Klerk et al.] α 100 > 0.99.