Convex relaxation. In example below, we have N = 6, and the cut we are considering

Similar documents
Convex relaxation. In example below, we have N = 6, and the cut we are considering

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Relaxations and Randomized Methods for Nonconvex QCQPs

COM Optimization for Communications 8. Semidefinite Programming

Convex Optimization M2

Handout 6: Some Applications of Conic Linear Programming

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Semidefinite Programming Basics and Applications

6.854J / J Advanced Algorithms Fall 2008

SDP Relaxations for MAXCUT

Mathematical Optimization Models and Applications

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

IE 521 Convex Optimization

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture: Examples of LP, SOCP and SDP

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Lecture 8: The Goemans-Williamson MAXCUT algorithm

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

A notion of Total Dual Integrality for Convex, Semidefinite and Extended Formulations

Lecture: Cone programming. Approximating the Lorentz cone.

Advances in Convex Optimization: Theory, Algorithms, and Applications

Lecture 6: Conic Optimization September 8

Lecture Semidefinite Programming and Graph Partitioning

Semidefinite Programming

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Fast Algorithms for SDPs derived from the Kalman-Yakubovich-Popov Lemma

Network Localization via Schatten Quasi-Norm Minimization

Approximation Algorithms

Recursive Estimation

Lecture Note 5: Semidefinite Programming for Stability Analysis

Convex Optimization M2

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Semidefinite Programming

Introduction to Semidefinite Programming I: Basic properties a

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Lecture 3: Semidefinite Programming

Lecture 13 March 7, 2017

1 Introduction Semidenite programming (SDP) has been an active research area following the seminal work of Nesterov and Nemirovski [9] see also Alizad

Tutorial on Convex Optimization: Part II

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Convex Optimization Boyd & Vandenberghe. 5. Duality

Fast and Robust Phase Retrieval

Decentralized Control of Stochastic Systems

Integer programming: an introduction. Alessandro Astolfi

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Convex sets, conic matrix factorizations and conic rank lower bounds

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

8 Approximation Algorithms and Max-Cut

Module 04 Optimization Problems KKT Conditions & Solvers

Robust linear optimization under general norms

On the Block Error Probability of LP Decoding of LDPC Codes

Sparse Solutions of an Undetermined Linear System

1 Robust optimization

An 0.5-Approximation Algorithm for MAX DICUT with Given Sizes of Parts

Lecture Notes 9: Constrained Optimization

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Quadratic reformulation techniques for 0-1 quadratic programs

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Lecture: Duality of LP, SOCP and SDP

1 The independent set problem

Lecture 20: Goemans-Williamson MAXCUT Approximation Algorithm. 2 Goemans-Williamson Approximation Algorithm for MAXCUT

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

The convex algebraic geometry of rank minimization

There are several approaches to solve UBQP, we will now briefly discuss some of them:

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

Handout 8: Dealing with Data Uncertainty

New Rank-One Matrix Decomposition Techniques and Applications to Signal Processing

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Constrained optimization

Relations between Semidefinite, Copositive, Semi-infinite and Integer Programming

Lecture 7: Convex Optimizations

Lecture 10. Semidefinite Programs and the Max-Cut Problem Max Cut

EE 227A: Convex Optimization and Applications October 14, 2008

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

5. Duality. Lagrangian

Nonconvex Quadratic Programming: Return of the Boolean Quadric Polytope

Homework 4. Convex Optimization /36-725

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

2318 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE Mai Vu, Student Member, IEEE, and Arogyaswami Paulraj, Fellow, IEEE

A Continuation Approach Using NCP Function for Solving Max-Cut Problem

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

A Hierarchy of Polyhedral Approximations of Robust Semidefinite Programs

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

The Difference Between 5 5 Doubly Nonnegative and Completely Positive Matrices

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

On self-concordant barriers for generalized power cones

A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones

Transcription:

Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution to the convex program gives information about (usually a lower bound) the solution to the original program. Usually this is done by either convexifying the constraints or convexifying the functional we will see examples of both below. MINCUT Previously, we have looked at the problem of finding the minimum cut of a directed graph. We have vertices indexed by,..., N; vertex is the source and vertex N is the sink. Between each pair of vertices (i, j) there is a capacity C i,j 0 if there is no edge from i to j, we take C i,j = 0. A cut partitions the vertices into two sets: a S which contains the source, and a set S c which contains the sink. The capacity of the cut is the sum of the capacities of all the edges that originate in S and terminate in S C. In example below, we have N = 6, and the cut we are considering has S = {,,, 5}: 3 7 6 3 6 6 5 5 S Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

The edges in this cut are 3, 6, and 5 6. The capacity of this cut is + 3 + = 7. In general, the capacity associated with a cut S is C i,j. i S,j S If we take the vector ν R N as {, i S, ν i = 0, n S then we can write the problem of finding the minimum cut as minimize ν i= C i,j max(ν i ν j, 0) subject to ν i {0, }, i =,..., N j= ν =, ν N = 0. To make the functional linear, we introduce λ i,j, and the minimum cut program can be rewritten as (MINCUT) minimize Λ,ν i= λ i,j C i,j subject to λ i,j = max(ν i ν j, 0) j= ν i {0, } ν =, ν N = 0, where it is understood that the constraints should hold for all i, j =,..., N. As it is stated, there are two things making this program nonconvex we have non-affine equality constraints relating λ i,j to ν i and ν j, and we have binary constraints on ν i. If we simply drop the integer constraint, and relax λ i,j = max(ν i ν i, 0) to λ i,j ν i ν i and λ i,j 0, Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

we are left with the linear program (LP-relax) minimize Λ,ν Λ, C subject to λ i,j ν i ν j λ i,j 0 ν =, ν N = 0. Note that the domain we are optimizing over in the LP relaxation is larger than the domain in the original formulation this means that every valid cut (feasible Λ, ν for the original program) is feasible in the LP relaxation. So at the very least we know that LP-relax MINCUT. But the semi-amazing thing is that the solutions to the two programs turn out to agree. We show this by establishing that for every solution of the relaxation, there is at least one cut with value less than or equal to LP-relax. We do this by generating a random cut (with the associated probabilities carefully chosen) and show that in expectation, it is less than LP-relax. Let Z be a uniform random variable on [0, ]. Let Λ, ν be solutions to (LP-relax). Create a cut S with the rule: if ν n > Z, then take n S. 3 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

The probability that a particular edge i j is in this cut is ( ) P (i S, j S) = P ν j Z ν i max(νi νj, 0), i, j N, νj, i = ; j =,..., N, νi, i =,..., N ; j = N, i = ; j = N. λ i,j, where the last three inequality follows simply from the constraints in (LP-relax). This cut is random, so its capacity is a random variable, and its expectation is E[capacity(S)] = i,j C i,j P (i S, j S) i,j C i,j λ i,j = LP-relax. Thus there must be a cut whose capacity is at most LP-relax. This establishes that MINCUT LP-relax. Of course, combining this with the result above means than MINCUT = LP-relax. This is an example of a wonderful situation where convex relaxation costs us nothing, but makes solving the program computationally tractable. Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

MAXCUT A good resource for this section and the next is Ben-Tal and Nemirovski [BTN0]. This problem has a very similar setup as the MINCUT problem, but it is different in subtle ways. We are given a graph; this time the edges are undirected, and have positive weights A i,j associated with them. Since the graph is undirected, A i,j = A j,i and so A is symmetric. We will also assume that A i,i = 0 for all i. As before, a cut partitions the vertices into two sets, S and S c these sets can be arbitrary; there is no notion of source and sink here. For example, the cut in this example: A, A,3 A 3, 3 A, A,5 5 A,6 A 5,6 6 S has value cut(s) = A, + A 3,. The problem is to find the cut that maximizes the weights of the edges going between the two partitions. We can specify a cut of the graph with a binary valued vector x of length N, where each x n {, }. If x n = if vertex n is in S and x n = if vertex n is in S c, then the value of the cut is cut(s) = A i,j ( x i x j ). i= j= 5 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Note that if x i x j, then ( x i x j ) =, while if x i = x j, then ( x i x j ) = 0. The factor of / in front comes from the fact that ( x i x j ) = for edges in the cut, and that we are counting every edge twice (from i to j and again from j to i). Notice that we can write this value as a quadratic function of x: cut(s) = ( T A x T Ax ) The MAXCUT problem is find the cut with the largest value: (MAXCUT) maximize x R N T A xt Ax subject to x i {, }. Right now, this looks pretty gnarly, as A has no guarantee of being PSD, and we have integer constraints on x. We can address the first concern by re-writing this as search for a matrix X = xx T. As now x T Ax = X, A, we have maximize X R N N T A X, A subject to X 0 X i,i =, rank(x) =. i =,..., N You should be able to convince yourself that X is feasible above if and only if it can be written as X = xx T, where the entries of x are ±. The recast program looks like a SDP, except for the rank constraint. The relaxation, then, is to simply drop it and solve (MAXCUT-relax) maximize X R N N T A X, A subject to X 0 X i,i =, i =,..., N. 6 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

As we are optimizing over a larger set, the optimal value of MAXCUTrelax will in general be larger than MAXCUT: MAXCUT-relax MAXCUT. But there is a classic result [GW95] that shows it will not be too much larger: MAXCUT (0.87856) MAXCUT-relax. The argument again relies on looking at the expected value of a random cut. Let X be a solution to MAXCUT-relax. Since X is PSD, it can be factored as X = V T V, with v j as the jth column of V, this means X i,j = v i, v j. Since along the diagonal we have X i,i =, this means that v i = as well. We can associate one column v i with each vertex in the original problem. To create the cut, we draw a vector z from the unit-sphere (so z = ) uniformly at random, and set S = {i : v i, z 0}. It should be clear that the probability that any fixed vertex is in S is /. But what is the probability that vertex i and vertex j are on different sides? The probability of this is simply the ratio of the angle between v i and v j to π: P (i S, j S) + P (i S, j S) = arccos v i, v k π = arccos X i,j. π In practice, you could do this by drawing each entry Normal(0, ) independently, then normalizing. 7 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Thus the expectation of the cut value is E[cut(S)] = A i,j (P (i S, j S) + P (i S, j S)) = i= i= j= j= arccos Xi,j A i,j. π There must be at least one cut that has a value greater than or equal to the mean, so we know that MAXCUT E[cut(S)]. Let s compare the terms in this sum to those in the objection function for MAXCUT-relax. We know that the entries in X have at most unit magnitude X i,j, and it is a fact that: arccos t (0.878856) ( t), for t [, ]. π Here is a little proof by MATLAB of this fact: 0.5 0.5 0. 0.35 0.3 0.5 0. 0.5 0. 0.05 0 - -0.5 0 0.5 t blue = arccos t π, red = (0.878856) ( t). This follows from X i,j = v i, v k, v i =, and Cauchy-Swartz. 8 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Thus MAXCUT E[cut(S)] arccos Xi,j = A i,j π i= j= (0.87856) i= j= A i,j( X i,j) = (0.87856) MAXCUT-relax 9 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Quadratic equality constraints The integer constraint x i {, } in the example above might also be interpreted as a quadratic equality constraint: x i {, } x i =. As we are well aware, quadratic (or any other nonlinear) equality constraints make the feasibility region nonconvex. We consider general nonconvex quadratic programs of the form minimize x T A 0 x + x, b 0 + c 0 subject to x R N x T A m x + x, b m + c m = 0 m =,..., M, where the A m are symmetric, but not necessarily 0. We will show how to recast these problems as optimization over the SDP cone with an additional (nonconvex) rank constraint. Then we will have a natural convex relaxation by dropping the rank constraint. This general methodology works for equality or (possibly nonconvex) inequality constraints, but for the sake of simplicity, we will just look at equality constraints. We can turn a quadratic form into a trace inner product with a rank matrix as follows. It is clear that x T Ax + b T x + c = [ x T ] [ [ ] A b x b c] T ([ ] ) [ ] A b x [xt = trace b T X c x, X x = ]. 0 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

This means we can write the nonconvex quadratic program as ([ ] ) ([ ] ) A0 b minimize trace 0 Am b x R N b T X 0 c x subject to trace m 0 b T X m c x m m =,..., M. = 0 With F m = [ ] Am b m, b T m c m we see that this program is equivalent to minimize X, F 0 subject to X, F m = 0, m =,..., M X R N N X 0 rank(x) =. Again, we can get a convex relaxation simply by dropping the rank constraint. How well this works depends on the particulars of the problem. There are certain situations where it is exact; one of these is when there is a single non-convex inequality constraint 3. There are other situations where it is provably good one example is MAXCUT above. There are other situations where it is arbitrarily bad. 3 This is very closely related to the dual derivations we did when analyzing robust least-squares. Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Example: Phase retrieval In coherent imaging applications, a not uncommon problem is to reconstruct an unknown vector x from measurements of the magnitude of a series of linear functionals. We observe y m = x, a m (+ noise), m =,..., M. For instance, if a m are Fourier vectors, we are observing samples of the magnitude of the Fourier transform of x. If we also measured the phase, then recovering x is a standard linear inverse problem (and if we have a complete set of samples in the Fourier domain, you can just take an inverse Fourier transform). But since we do not get to see the phase, we have to estimate it along with the underlying x this problem is often referred to as phase retrieval. We can rewrite the measurements as y m = a m, x x, a m = x H a m a H mx = trace(a m a H mxx H ) X, A m F, where A m = a m a H m and X = xx H. So solving the phase retrieval problem is the same as finding an N N matrix with the following properties: X, A m F = y m, m =,..., M, X 0, rank(x) =. The first condition is just that X obeys a certain set of linear equality constraints; the second is that X is in the SDP cone; the third is a nonconvex constraint. One convex relaxations for this problem simply drops the rank constraint and finds a feasible point that obeys the first two conditions. Under certain conditions on the a m, there will only be one point in this intersection once M is mildly larger than N. Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

Example: Maximum Likelihood MIMO decoding In MIMO (multiple input multiple output) digital communications, we have N transmit antennas and M receive antennas. Each transmit antenna sends a bit (±), which are collected together into an N-vector x. The receive antennas observe y = Hx + v, where H is a known M N fading matrix (or channel matrix), and v Normal(0, σ I).. The maximum likelihood decoder finds the binary valued vector x that makes the observations y most likely: ˆx ML = arg max p(y x). x {,} N Show how this can be written as a least-squares problem with integer constraints.. Show how the optimization program above can be relaxed into a semidefinite program. 3 Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07

References [BTN0] A. Ben-Tal and A. Nemirovski. Lectures on Modern Covex Optimization. SIAM, 00. [GW95] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comp. Mach., (6):5 5, November 995. Georgia Tech ECE 883a Notes by J. Romberg. Last updated :0, April 3, 07