MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Similar documents
Convex relaxation. In example below, we have N = 6, and the cut we are considering

Semidefinite Programming Basics and Applications

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Relaxations and Randomized Methods for Nonconvex QCQPs

Introduction to Semidefinite Programming I: Basic properties a

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

1 Introduction Semidenite programming (SDP) has been an active research area following the seminal work of Nesterov and Nemirovski [9] see also Alizad

4. Algebra and Duality

Lecture: Examples of LP, SOCP and SDP

Handout 6: Some Applications of Conic Linear Programming

On the Sandwich Theorem and a approximation algorithm for MAX CUT

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

IE 521 Convex Optimization

6.854J / J Advanced Algorithms Fall 2008

Semidefinite Programming

Lecture Note 5: Semidefinite Programming for Stability Analysis

SDP Relaxations for MAXCUT

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Convex Optimization M2

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Convex sets, conic matrix factorizations and conic rank lower bounds

Algebraic Optimization and Semidefinite Optimization

Robust and Optimal Control, Spring 2015

Quadratic reformulation techniques for 0-1 quadratic programs

Lecture 6: Conic Optimization September 8

8 Approximation Algorithms and Max-Cut

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Lecture 9: Low Rank Approximation

Approximation Algorithms

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Convex Optimization M2

Lecture 3: Semidefinite Programming

SEMIDEFINITE PROGRAM BASICS. Contents

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Lecture Semidefinite Programming and Graph Partitioning

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

5. Duality. Lagrangian

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

1. Introduction. Consider the following quadratic binary optimization problem,

BBM402-Lecture 20: LP Duality

Lecture 17 (Nov 3, 2011 ): Approximation via rounding SDP: Max-Cut

There are several approaches to solve UBQP, we will now briefly discuss some of them:

Advances in Convex Optimization: Theory, Algorithms, and Applications

EE 227A: Convex Optimization and Applications October 14, 2008

16.1 L.P. Duality Applied to the Minimax Theorem

Relation of Pure Minimum Cost Flow Model to Linear Programming

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Lecture 10. Semidefinite Programs and the Max-Cut Problem Max Cut

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

Part IB Optimisation

Lecture 20: Goemans-Williamson MAXCUT Approximation Algorithm. 2 Goemans-Williamson Approximation Algorithm for MAXCUT

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Lagrange Relaxation and Duality

Bounds on the radius of nonsingularity

Lecture: Duality.

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

8. Diagonalization.

Lecture #21. c T x Ax b. maximize subject to

Lecture 16: Constraint Satisfaction Problems

Mustapha Ç. Pinar 1. Communicated by Jean Abadie

The complexity of optimizing over a simplex, hypercube or sphere: a short survey

c 2000 Society for Industrial and Applied Mathematics

Convex Optimization Boyd & Vandenberghe. 5. Duality

Constrained Optimization

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Rank minimization via the γ 2 norm

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Approximation norms and duality for communication complexity lower bounds

1 Strict local optimality in unconstrained optimization

On the local stability of semidefinite relaxations

Modeling with semidefinite and copositive matrices

Jeff Howbert Introduction to Machine Learning Winter

9. Interpretations, Lifting, SOS and Moments

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Semidefinite Programming

Handout 6: Some Applications of Conic Linear Programming

On a Polynomial Fractional Formulation for Independence Number of a Graph

Global Quadratic Minimization over Bivalent Constraints: Necessary and Sufficient Global Optimality Condition

Relations between Semidefinite, Copositive, Semi-infinite and Integer Programming

Topic: Primal-Dual Algorithms Date: We finished our discussion of randomized rounding and began talking about LP Duality.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Nonconvex Quadratic Programming: Return of the Boolean Quadric Polytope

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Lecture 3 January 28

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

The Geometry of Semidefinite Programming. Bernd Sturmfels UC Berkeley

Grothendieck s Inequality

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko

Lecture 7: Weak Duality

Lecture: Cone programming. Approximating the Lorentz cone.

approximation algorithms I

Transcription:

MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications of semidefinite programming, namely its use in the formulation of convex relaxations of nonconvex optimization problems. We will present the results from several different, but complementary, points of view. hese will also serve us as starting points for the generalizations to be presented later in the course. We will discuss first the case of binary quadratic optimization, since in this case the notation is simpler, and perfectly illustrates many of the issues appearing in more complicated problems. Afterwards, a more general formulation containing arbitrary linear and quadratic constraints will be presented. Binary optimization Binary (or Boolean) quadratic optimization is a classical combinatorial optimization problem. In the version we consider, we want to minimize a quadratic function, where the decision variables can only take the values ±. In other words, we are minimizing an (indefinite) quadratic form over the vertices of an n dimensional hypercube. he problem is formally expressed as: minimize x Qx s.t. x i {, } () where Q S n. here are many well known problems that can be naturally written in the form above. Among these, we mention the maximum cut problem (MAXCU) discussed below, the knapsack, the linear quadratic regulator (LQR) control problem with binary inputs, etc. Notice that we can model the Boolean constraints using quadratic equations, i.e., x i = x i {, }. hese n quadratic equations define a finite set, with an exponential number of elements, namely all the n tuples with entries in {, }. here are exactly n points in this set, so a direct enumeration approach to () is computationally prohibitive when n is large (already for n = 3, we have n 9 ). We can thus write the equivalent polynomial formulation: minimize x Qx s.t. x i = () We will denote the optimal value and optimal solution of this problem as f and x, respectively. It is well known that the decision version of this problem is NP complete (e.g., [GJ79]). Notice that this is true even if the matrix Q is positive definite (i.e., Q ), since we can always make Q positive definite by adding to it a constant multiple of the identity (this only shifts the objective by a constant). Example (MAXCU) he maximum cut (MAXCU) problem consists in finding a partition of the nodes of a graph G = (V, E) into two disjoint sets V and V (V V =, V V = V ), in such a way to maximize the number of edges that have one endpoint in V and the other in V. It has important practical applications, such as optimal circuit layout. he decision version of this problem (does there exist a cut with value greater than or equal to K?) is NP complete [GJ79]. We can easily rewrite the MAXCU problem as a binary optimization problem. A standard formulation (for the weighted problem) is the following: max w ( y i y j ), (3) y i {,} 4 i,j 3

where w is the weight corresponding to the (i, j) edge, and is zero if the nodes i and j are not connected. he constraints y i {, } are equivalent to the quadratic constraints yi =. We can easily convert the MAXCU formulation into binary quadratic programming. Removing the constant term, and changing the sign, the original problem is clearly equivalent to: min w y i y j. (4). Semidefinite relaxations y = i i,j Computing good solutions to the binary optimization problem given in () is a quite difficult task, so it is of interest to produce accurate bounds on its optimal value. As in all minimization problems, upper bounds can be directly obtained from feasible points. In other words, if x R n has entries equal to ±, it always holds that f x Qx (of course, for a poorly chosen x, this upper bound may be very loose). o prove lower bounds, we need a different technique. here are several approaches to do this, but as we will see in detail in the next sections, many of them will turn out to be exactly equivalent in the end. Indeed, many of these different approaches will yield a characterization of a lower bound in terms of the following primal dual pair of semidefinite programming problems: minimize r QX maximize r Λ s.t. X ii = s.t. Q Λ (5) X Λ diagonal In the next sections, we will derive these SDPs several times, in a number of different ways. Let us notice here first that for this primal dual pair of SDP, strong duality always holds, and both achieve their corresponding optimal solutions (why?).. Lagrangian duality A general approach to obtain lower bounds on the value of general (non)convex minimization problems is to use Lagrangian duality. As we have seen the original Boolean minimization problem can be written as: minimize x Qx s.t. x i = For notational convenience, let Λ = diag(λ,..., λ n ). hen, the Lagrangian function can be written as: n L(x, λ) = x Qx λ i (x i ) = x (Q Λ)x + r Λ. i= For the dual function g(λ) := inf x L(x, λ) to be bounded below, we need the implicit constraint that the matrix Q Λ must be positive semidefinite. In this case, the optimal value of x is zero, and thus we obtain a lower bound given by the solution of the SDP: (6) maximize rλ s.t. Q Λ (7) his is exactly the dual side of the SDP in (5). 3

E E Figure : he ellipsoids E and E..3 Underestimator of the objective A different but related interpretation of the SDP relaxation (5) is through the notion of an underestimator of the objective function. Indeed, the quadratic function x Λx is an easily optimizable function that is guaranteed to lie below the desired objective x Qx. o see this, notice that for any feasible x we have where he first inequality follows from Q Λ n x Qx x Λx = Λ ii x i = r Λ, he second equation holds since the matrix Λ is diagonal Finally, the third one holds since x i {+, } here is also a nice corresponding geometric interpretation. For simplicity, we assume without loss of generality that Q is positive definite. hen, the problem () can be intepreted as finding the largest value of γ for which the ellipsoid {x R n x Qx γ} does not contain a vertex of the unit hypercube. Consider now the two ellipsoids in R n defined by: i= E = {x R n x Qx rλ} E = {x R n x Λx rλ}. he principal axes of ellipsoid E are aligned with the coordinates axes (since Λ is diagonal), and furthermore its boundary contains all the vertices of the unit hypercube. Also, it is easy to see that the condition Q Λ implies E E. With these facts, it is easy to understand the related problem that the SDP relaxation is solving: dilating E as much as possible, while ensuring the existence of another ellipsoid E with coordinatealigned axes and touching the hypercube in all n vertices; see Figure for an illustration..4 Probabilistic interpretation o be written odo 3 3

.5 - -.5.5 -.5 - - -.5.5 Figure : he three dimensional spectraplex. his is the set of 3 3 positive semidefinite matrices, with unit diagonal..5 Lifting and rank relaxation We present yet another derivation of the SDP relaxations, this time focused on the primal side. Recall the original formulation of the optimization problem (). Define now X := xx. By construction, the matrix X S n satisfies X, X ii = x i =, and has rank one. Conversely, any matrix X with X, X ii =, rank X = necessarily has the form X = xx for some ± vector x (why?). Furthermore, by the cyclic property of the trace, we can express the objective function directly in terms of the matrix X, via: x Qx = r x Qx = r Qxx = r QX. As a consequence, the original problem () can be exactly rewritten as: minimize r QX s.t. X ii =, X rank(x) = his is almost an SDP problem (all the constraints are either linear or conic), except for the rank one constraint on X. Since this is a minimization problem, a lower bound on the solution can be obtained by dropping the (nonconvex) rank constraint, which enlarges the feasible set. A useful interpretation is in terms of a nonlinear lifting to a higher dimensional space. Indeed, rather than solving the original problem in terms of the n dimensional vector x, we are instead solving ( ) for the n+ n n matrix X, effectively converting the problem from R n to S n (which has dimension ). Observe that this line of reasoning immediately shows that if we find an optimal solution X of the SDP (5) that has rank one, then we have solved the original problem. Indeed, in this case the upper and lower bounds on the solution coincide. As a graphical illustration, in Figure we depict the set of 3 3 positive semidefinite matrices of unit diagonal. he rank one matrices correspond to the four vertices of this convex set, and are in (two to one) correspondence with the eight 3 vectors with ± entries. In general, it is not the case that the optimal solution of the SDP relaxation will be rank one. However, as we will see in the next section, it is possible to use rounding schemes to obtain nearby rank one solutions. Furthermore, in some cases, it is possible to do so while obtaining some approximation guarantees on the quality of the rounded solutions. 3 4

Bounds: Goemans Williamson and Nesterov So far, our use of the SDP relaxation (5) has been limited to providing only a posteriori bounds on the optimal solution of the original minimization problem. However, two desirable features are missing: Approximation guarantees: is it possible to prove general properties on the quality of the bounds obtained by SDP? Feasible solutions: can we (somehow) use the SDP relaxations to provide not just bounds, but actual feasible points with good (or optimal) values of the objective? As we will see, it turns out that both questions can be answered in the positive. As it has been shown by Goemans and Williamson [GW95] in the MAXCU case, and Nesterov in a more general setting, we can actually achieve both of these objectives by randomly rounding in an appropriate manner the solution X of this relaxation. We discuss these results below.. Goemans and Williamson rounding In their celebrated MAXCU paper, Goemans and Williamson developed the following randomized method for finding a good feasible cut from the solution of the SDP. Factorize X as X = V V, where V = [v... v n ] R r n, where r is the rank of X. hen X = v i v j, and since X ii = this factorization gives n vectors v i on the unit sphere in R r Instead of assigning either or to each variable, we have assigned to each a point on the unit sphere in R r. Now, choose a random hyperplane in R r, and assign to each variable x i either a + or a, depending on which side of the hyperplane the point v i lies. It turns out that this procedure gives a solution that, on average, is quite close to the value of the SDP bound. We will compute the expected value of the rounded solution in a slightly different form from the original G W argument, but one that will be helpful later. he random hyperplane can be characterized by its normal vector p, which is chosen to be uniformly distributed on the unit sphere (e.g., by suitably normalizing a standard multivariate Gaussian random variable). hen, according to the description above, the rounded solution is given by x i = sign(p v i ). he expected value of this solution can then be written as: E p [x Qx] = Q E p [x i x j ] = Q E p [sign(p v i ) sign(p v j )]. We can easily compute the value of this expectation. Consider the plane spanned by v i and v j, and let θ be the angle between these two vectors. hen, it is easy to see that the desired expectation is equal to the probability that both points are on the same side of the hyperplane, minus the probability that they are on different sides. hese probabilities are θ and θ, respectively. hus, the expected value of the rounded solution is exactly: ( ) ( ) Q θ = Q arccos(v v j ) = i Q arcsin X. (8) Notice that the expression is of course well defined, since if X is PSD and has unit diagonal, all its entries are bounded in absolute value by. his result exactly characterizes the expected value of the rounding procedure, as a function of the optimal solution of the SDP. We would like, however, to directly relate this quantity to the optimal solution of the original optimization problem. For this, we will need additional assumptions on the matrix Q. We discuss next two of the most important results in this direction. 3 5

.8.6.4 (/) arccos t α ( t)..8.6.4..5.5 t Figure 3: Bound on the inverse cosine function, for α.878.. MAXCU bound Recall from (3) that for the MAXCU problem, the objective function does not only include the quadratic part, but there is actually a constant term: w ( y i y j ). 4 he expected value of the cut is then: ( ) c sdp expected = w arcsin X = w arccos X. 4 4 On the other hand, the solution of the SDP gives an upper bound on the cut capacity equal to: c sdp upper bound = w ( X ). 4 o relate these two quantities, we look for a constant α such that α ( t) arccos(t) for all t [, ] he best possible (i.e., largest) such constant is α =.878; see Figure 3. So we have c sdp upper bound α 4 w arccos X = c sdp expected α Notice that here we have used the nonnegativity of the weights (i.e., w ). hus, so far we have the following inequalities: c sdp upper bound α c sdp expected Also clearly c sdp expected c max And c max c sdp upper bound Putting it all together, we can sandwich the value of the relaxation as follows: α c sdp upper bound c sdp expected c max c sdp upper bound. 3 6

.3 Nesterov s result A result by Nesterov generalizes the MAXCU bound described above, but for a larger class of problems. he original formulation is for the case of binary maximization, and applies to the case when the matrix A is positive semidefinite. Since the problem is homogeneous, the optimal value is guaranteed to be nonnegative. As we have seen, the expected value of the solution after randomized rounding is given by (8). Since X is positive semidefinite, it follows from the nonnegativity of the aylor series of arcsin(t) t and the Schur product theorem that arcsin[x] X, where the arcsin function is applied componentwise. his inequality can be combined with (8) to give the bounds: f sdp upper bound f sdp expected f max f sdp upper bound, where /.636. For more details, see [BN, Section 4.3.4]. Among others, the paper [Meg] presents several new results, as well as a review of many of the available approximation schemes. 3 Linearly constrained problems In this section we extend the earlier results, to general quadratic optimization problems under linear and quadratic constraints. For notational simplicity, we write the constraints in homogeneous form, i.e., [ ] in terms of the vector x = y. he general primal form of the SDP optimization problems we are concerned with is min x Qx s.t. x A i x Bx [ ] x = y he corresponding primal and dual SDP relaxations are given by min Q X max γ s.t. A i X BXB E X = s.t. Q γ E + i λ i A i + B NB λ i N X N ii = (9) Here E denotes the matrix with a on the (, ) component, and all the rest being zero. he dual variables λ i can be interpreted as Lagrange multipliers associated to the quadratic constraints of the primal problem, while N corresponds to pairwise products of the linear constraints. References [BN] A. Ben al and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,. 3 7

[GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A guide to the theory of NP completeness. W. H. Freeman and Company, 979. [GW95] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 4(6):5 45, 995. [Meg] A. Megretski. Relaxations of quadratic programs in operator theory and system analysis. In Systems, approximation, singular integral operators, and related topics (Bordeaux, ), volume 9 of Oper. heory Adv. Appl., pages 365 39. Birkhäuser, Basel,. 3 8