Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Similar documents
The maximal stable set problem : Copositive programming and Semidefinite Relaxations

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

Introduction to Semidefinite Programming I: Basic properties a

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Four new upper bounds for the stability number of a graph

Lecture 4: January 26

1 The independent set problem

Reducing graph coloring to stable set without symmetry

Modeling with semidefinite and copositive matrices

Chapter 3. Some Applications. 3.1 The Cone of Positive Semidefinite Matrices

Lectures 6, 7 and part of 8

Graph coloring, perfect graphs

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Applications of the Inverse Theta Number in Stable Set Problems

A Review of Linear Programming

1 Positive definiteness and semidefiniteness

Semidefinite programs and combinatorial optimization

Lecture #21. c T x Ax b. maximize subject to

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method

1 Integer Decomposition Property

CS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs. Instructor: Shaddin Dughmi

Semidefinite Programming Basics and Applications

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Mustapha Ç. Pinar 1. Communicated by Jean Abadie

Approximation Algorithms

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi

Copositive Programming and Combinatorial Optimization

BBM402-Lecture 20: LP Duality

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

4. Algebra and Duality

Convex and Semidefinite Programming for Approximation

JOHN THICKSTUN. p x. n sup Ipp y n x np x nq. By the memoryless and stationary conditions respectively, this reduces to just 1 yi x i.

CO 250 Final Exam Guide

Lecture 13: Spectral Graph Theory

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Lecture: Introduction to LP, SDP and SOCP

Lift-and-Project Techniques and SDP Hierarchies

Sandwich Theorem and Calculation of the Theta Function for Several Graphs

LNMB PhD Course. Networks and Semidefinite Programming 2012/2013

SDP Relaxations for MAXCUT

16.1 L.P. Duality Applied to the Minimax Theorem

Advanced Combinatorial Optimization September 22, Lecture 4

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Copositive Programming and Combinatorial Optimization

The Simplest Semidefinite Programs are Trivial

Notes 6 : First and second moment methods

Lecture 5 January 16, 2013

Semidefinite Programming

Lecture 8 : Eigenvalues and Eigenvectors

Acyclic Digraphs arising from Complete Intersections

Strongly Regular Decompositions of the Complete Graph

On the Sandwich Theorem and a approximation algorithm for MAX CUT

Operations Research. Report Applications of the inverse theta number in stable set problems. February 2011

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems

3. Linear Programming and Polyhedral Combinatorics

3.7 Cutting plane methods

Conic approach to quantum graph parameters using linear optimization over the completely positive semidefinite cone

Lecture 14: Optimality Conditions for Conic Problems

WHEN DOES THE POSITIVE SEMIDEFINITENESS CONSTRAINT HELP IN LIFTING PROCEDURES?

1 Strict local optimality in unconstrained optimization

A notion of Total Dual Integrality for Convex, Semidefinite and Extended Formulations

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Combinatorial Optimization

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Convex Optimization M2

Chapter 1. Preliminaries

6.854J / J Advanced Algorithms Fall 2008

LOVÁSZ-SCHRIJVER SDP-OPERATOR AND A SUPERCLASS OF NEAR-PERFECT GRAPHS

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

5. Duality. Lagrangian

Relaxations and Randomized Methods for Nonconvex QCQPs

Lecture Semidefinite Programming and Graph Partitioning

Duality of LPs and Applications

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

A taste of perfect graphs

EE 227A: Convex Optimization and Applications October 14, 2008

IE 5531: Engineering Optimization I

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Linear and Integer Programming - ideas

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Improved bounds on book crossing numbers of complete bipartite graphs via semidefinite programming

3.8 Strong valid inequalities

Preliminaries and Complexity Theory

3. Linear Programming and Polyhedral Combinatorics

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

3.7 Strong valid inequalities for structured ILP problems

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

Multi-coloring and Mycielski s construction

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

Mathematical Programs Linear Program (LP)

Lecture: Examples of LP, SOCP and SDP

Lecture 12 : Graph Laplacians and Cheeger s Inequality

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs

3. THE SIMPLEX ALGORITHM

Topics in Graph Theory

Notice that lemma 4 has nothing to do with 3-colorability. To obtain a better result for 3-colorable graphs, we need the following observation.

Laplacian Integral Graphs with Maximum Degree 3

Transcription:

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize it to an affine set of matrices. Next, we define Lovász Theta function and prove Lovász s sandwich theorem. Finally, we introduce further extensions to the eigenvalue SDP for finding minimum sum of k largest eigenvalues over an affine set of matrices. 2 Finding Largest Eigenvalue (Continued) In the previous lecture we have introduced sets S := {X : tr X = 1, X 0}, E := {qq T : q 2 = 1}. We have shown that E S and members of E are extreme points of S. For a fixed symmetric matrix A we have defined the largest eigenvalue SDP as: The dual to this problem is: λ 1 (A) = min z s.t. zi A λ 1 (A) = max A Y s.t. tr Y = 1 Y 0 1

Note that S is exactly the feasible set of the dual. As we know, optimal value is attained at an extreme point of these set, i.e. Y E if and only if Y is an optimal solution of the dual. Thus there exists q such that Y = qq T. So, we can write: Thus, λ 1 (A) = maxy A = λ 1 (A) = max q 2 =1 qqt A = max q 2 =1 qt Aq. max q 2 =1 qt Aq, (1) which is a well-known result from linear algebra, that we have proven using semi-definite programming duality. Note 1 Similarly we can show that the smallest eigenvalue λ n (A) can be expressed as: λ n (A) = min q 2 =1 qt Aq. 3 Lovász Theta Function In this section we describe extensions of the largest eigenvalue problem. Interested students are referred to the classic book Geometric Algorithms and Combinatorial Optimization by M. Grötschel, L. Lovász and A. Schrijver, Chapter 9. 3.1 Optimizing Largest Eigenvalue Over an Affine Set Suppose now that we are solving the same largest eigenvalue problem, but the matrix A is not fixed, and we are looking for a way to minimize the largest eigenvalue over a set of matrices. For example, consider the linear combination of symmetric matrices, A := A 0 + x 1 A 1 + + x m A m. The problem in this case translates into the following unconstrained optimization problem: min λ 1(A 0 + x 1 A 1 + + x m A m ). x R m It can be shown that this function is not linear. Furthermore, it is not smooth, and this makes optimization a very difficult task. However, we will prove that this function is convex, and therefore the task of finding it s minimum should be tractable. Proposition 1 Function f(x) := λ 1 (A 0 + x 1 A 1 + + x m A m ) is convex. Proof: We will prove that λ 1 (A+B) λ 1 (A)+λ 1 (B), and the result will follow. Using (1), we can state that there exists q such that: λ 1 (A + B) = q T (A + B)q = q T Aq + q T Bq 2

The value of q is feasible, but not optimal, if we apply (1) to A and B separately. Therefore, and the result follows. q T Aq + q T Bq λ 1 (A) + λ 1 (B), We can rewrite the problem as an equivalent semidefinite programming problem: min z The dual for this problem is: s.t. zi x 1 A 1 x m A m A 0 max A 0 Y s.t. tr Y = 1 A i Y = 0 Y 0 People have studied this problem from the point of view of unconstrained optimization. It turns out to be a very difficult problem, when formulated this way. The graph of the function is not smooth. Newton method can not be applied directly, since taking derivative is not always possible. Semidefinite programs, in turn, can be solved efficiently with any given precision. We will describe algorithms for solving SDPs later in the course. 3.2 Maximum Clique, Stable Set and Graph Coloring Problems Assume G = (V, E) is an undirected graph, no loops attached to vertices, i.e. edges (i,i) are not allowed in G. Definition 1 Maximum clique problem: find a maximum fully connected subgraph of G (a clique). I.e. find a subgraph G 0 = (V 0, E 0 ), V 0 V, E 0 E such that i, j V 0, i j (i, j) E 0 and V 0 is maximal. The maximum clique problem is NP-complete. In fact, it was one of the original problems shown to be NP-complete. Example 1 (Maximum Clique) Consider the graph on Figure 1. In this graph the cliques of size more than 2 are {2,3,4,5} and {1,2,5}. The former clique is maximum. Definition 2 Maximum independent (or stable) set problem: find a maximum set of vertices none of which are connected with an edge. 3

1 5 2 4 3 Figure 1: An example of an undirected graph. The maximum independent set problem is in some sense a dual of the maximum clique problem. The maximum independent set of a graph is the maximum clique of its complement. Definition 3 Complement graph Ḡ = (V, Ē) of a graph G = (V, E) is a graph, that has the same set of vertices, and satisfying the following property: every pair of vertices, that is connected in the original graph is not connected in its complement, and vice versa. Example 2 (Stable Set) Using the graph in Figure 1, the optimal solutions are {1,4} and {1,3}. 1 5 2 4 3 Figure 2: Complement of a graph in Figure 1. Example 3 (Complement Graph) Figure 2 shows the complement graph for the graph shown on Figure 1. Note 2 Graph G is a complement of Ḡ. Note 3 Every solution to the maximum stable set problem on a graph G is a solution for the maximum clique problem on the graph Ḡ, and vice versa. Definition 4 Minimum graph coloring problem: assign colors to vertices such that every pair of adjacent vertices has different colors, using the smallest possible total number of colors. The number of colors used in the optimal solution to the problem is referred to as the chromatic number of the graph. 4

Finding the chromatic number of a graph is also NP-complete. In fact, it can be proven that there does not exist an ε-approximation algorithm for this problem. In other words, for any given ε 1 it is impossible to give a polynomial algorithm that for every graph G finds a number that is guaranteed to be at most εχ(g), unless P = NP. We will use the following notation to denote optimal values of these problems: ω(g) - size of maximum clique in graph G; α(g) - size of largest stable set in graph G; χ(g) - chromatic number of graph G. Proposition 2 ω(g) χ(g). Proof: The vertices of each clique must have different colors. Example 4 (Strict Inequality) However, the equality does not hold all the time. The example of a cycle of 5 vertices shows that in some cases the inequality above is strict. 1 5 2 4 3 Figure 3: Odd cycle example. It is easy to see that the graph shown in Figure 3 can not be painted with less than 3 colors, therefore χ(g) = 3. However, the largest clique in this graph has only 2 vertices, and ω(g) = 2 < 3 = χ(g). This is actually true for any odd cycle. Proposition 3 If a symmetric matrix A R n n and its submatrix B R k k are such that ( k then A = k B C C T D λ 1 (B) λ 1 (A). Proof: From (1) it follows that there exists q R k, satisfying q 2 = 1, such that: λ 1 (B) = q T Bq. ), 5

Recall that the value of vector q, at which maximum in (1) is achieved, is the eigenvector corresponding to the eigenvalue λ 1 (B). Further, λ 1 (B) = q T Bq = ( q T 0 ) ( ) ( ) B C q C T λ D 0 1 (A), where the inequality follows from the fact that vector ( q T optimal for the entire matrix A in terms of (1). 0 ) is not necessarily Corollary 1 If A R n n is a symmetric matrix and its submatrix B R k k is composed of the rows and columns corresponding the same index subsets, i.e. for each column i of A that is included in B, the row i of A is included in B, and vice versa, then λ 1 (B) λ 1 (A). Proof: This corollary states that the submatrix B does not have to be in the corner of A, it can be distributed over A, as long as corresponding rows and columns are participating. We will make use of permutation matrices to prove this. The matrix A can be transformed to have the structure required in Proposition 3 via multiplication to permutation matrices. A permutation matrix is a square matrix composed of 0 and 1 elements, and having only one non-zero in each row and in each column. If the permutation matrix P rearranges columns, when multiplied on the right side, then P T on the left side results in the rearrangement of corresponding columns. Therefore there exists a permutation matrix P such that P T AP has the required structure. It is known from linear algebra that permutation matrices are orthogonal, and P T = P 1. Therefore permutation does not change eigenvalues, and the matrix P T AP has the same eigenvalues as A. Thus, λ 1 (B) λ(p T AP) = λ 1 (A), where the inequality follows from Proposition 3. Let us now apply this result to adjacency matrix A of an arbitrary graph G. An adjacency matrix is defined as A = (a ij ), where a ij = 1 if there exists an edge (i, j), and a ij = 0 otherwise. We will illustrate the procedure on the example graph in Figure 1. For this graph, the adjacency matrix A is 0 1 0 0 1 1 1 0 0 1 1 0 1 1 1 A = 0 1 0 1 1 0 1 1 0 1 and A + I = 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1. 1 1 1 1 0 1 1 1 1 1 It is easy to see that cliques in the graph correspond to submatrices of ones in the matrix A + I, composed of the corresponding rows and columns. Now we can apply Corollary 1. Thus, for each such clique submatrix J k = 11 T of size k k, λ 1 (J k ) λ 1 (A + I). (2) 6

The eigenvalues of J k are easy to compute. It is a rank 1 matrix, therefore it has only one non-zero eigenvalue, and it is easy to see that the vector 1 is the eigenvector corresponding to this non-zero value, and the value itself is equal to k. Thus, λ 1 (J k ) = k. Substituting in (2), k λ 1 (A + I). Since this equation holds for any clique size k, it also holds for the maximum clique size: ω(g) λ 1 (A + I). (3) 3.3 Lovász Theta Function Let us see if the inequality (3) can be made tighter, so we can obtain a better estimate of ω(g). Note that Corollary 1 can still be applied in the same way, and (3) holds, even if the zeros in A + I are replaced with arbitrary values, as long as the matrix A + I remains symmetric. The reason for this is that there are no zeros inside any clique submatrix J k, so that part (corresponding to B in Corollary (1) remains unchanged. Formally, let { 1, if (i, j) E or i = j; [A(x)] ij = x ij = x ji, otherwise. Now we can write a stronger version of inequality (3): ω(g) min x λ 1 (A(x)). (4) Definition 5 The right hand side of (4) is referred to as Lovácz θ-function of a graph. It is denoted θ(g) = min x λ 1 (A (x)). Note that A(x) defined above can be written as A(x) = A 0 + x 1 A 1 + x 2 A 2 + + x m A m. The example below illustrates how this can be done. Example 5 (Rewriting A(x) as sum) 1 x y x 1 1 = x(e 12 + E 21 ) + y(e 13 + E 31 ) + (A + I). y 1 1 We know that ω(g) χ(g). Now we can formulate the following theorem, that makes a stronger statement. This theorem is also known as Lovász s sandwich theorem. Theorem 1 ω(g) θ(g) χ(g). 7

Before we prove this theorem, let us make several observations regarding graph sums. Definition 6 Given graphs G 1 = (V 1, E 1 ), G 2 = (V 2, E 2 ) the sum of graphs G 1 G 2 is a graph formed by putting together graphs G 1 and G 2. In other words, G 1 G 2 = (V 1 V 2, E 1 E 2 ). The following propositions are straightforward to prove, so the proofs are not given. Proposition 4 ω(g 1 G 2 ) = max{ω(g 1 ), ω(g 2 )}. Proposition 5 χ(g 1 G 2 ) = max{χ(g 1 ), χ(g 2 )}. Although given the two propositions above, the following proposition may be anticipated, the proof is not as obvious. Proposition 6 θ(g 1 G 2 ) = max{θ(g 1 ), θ(g 2 )}. Proof: Write an SDP for finding the value of θ(g) : θ(g) = min z s.t. zi X 0 x ij = 1 if i = j or (i, j) E Here is the dual of this problem: max y ij + y ii = (A + I) Y s.t. ij E tr Y = 1 y ij = 0 Y 0 (i, j) E Now let us write the dual for graph composition G 1 G 2. The dual variable will have the following structure (since there are no edges between the graphs in the sum): ( ) Y1 0. 0 Y 2 Thus, the dual program is max (A + I) Y 1 + (A + I) Y 2 s.t. tr Y 1 + tr Y 2 = 1 (5) y ij = 0 Y 1, Y 2 0 (i, j) E 8

The formulation above resembles the original dual formulation. It could be separated into two problems (for graphs G 1 and G 2 ) and the proposition will be proven immediately, if not for the constraint (5), that makes Y 1 dependent on Y 2. Note that we can replace Y with αy, in the dual problem, for some α > 0, and the objective still remains the same, except that everything is now scaled down by α. Now let us choose an α such that 0 α 1, and replace (5) with the one of following two constraints, and solve the two dual problems: tr Y 1 = α tr Y 2 = 1 α The solutions to these independent problems will be, as we know, αθ(g 1 ) and (1 α)θ(g 2 ) correspondingly. The solution to the dual problem for G 1 G 2 is then: [ max αθ(g1 ) + (1 α)θ(g 2 ) ] = max{θ(g 1 ), θ(g 2 )}, 0 α 1 where the equality follows from the linearity of the optimal solution in α. Now we are ready to prove Theorem 1. Proof: Denote k = χ(g). The nodes of the graph G can be partitioned into k classes according to their color. We can make all classes contain the same number of nodes (r) by introducing additional singleton nodes, that are not connected to any nodes in the graph, and therefore can have any color. This procedure does not change θ(g) according to Proposition 6, since θ of a single vertex is 1. Denote this new graph G 1. Then, θ(g) = θ(g 1 ), V(G 1 ) = kr. Consider matrix A(α) R kr kr, A(α) := (αj r + (1 α)i r ) J r J r J r (αj r + (1 α)i r ) J r J r J r (αj r + (1 α)i r ) Note that (αj r + (1 α)i r ) is a symmetric r r matrix having the following structure: 1 α α α α 1 α α (αj r + (1 α)i r ) = α α 1 α α α α 1 J r, as before, is a matrix of all ones of size r r. It is easy to see that matrix A(α) is a special case of (A(x)) matrix for graph G 1. Vertices within a color class are not connected, and if we rename them so that classes follow one another, we will have zero matrices of size r r 9

on the diagonal in adjacency matrix for G 1. Now we can replace all zeros with variables, and we choose to replace some of them with α, which preserves matrix symmetry. Since it is a special case, and by definition of θ(g 1 ), Note that we can rewrite A(α) as: θ(g 1 ) λ 1 (A(α)). A(α) = J kr (1 α)(j r J r J r ) + (1 α)i. (6) Now we would like to find the value of λ 1 (A(α)) as a function of α in closed form, and choose α such that λ 1 (A(α)) k = χ(g 1 ) = χ(g). Note that α does not have to be positive. Eigenvalue of matrix sum is equal to sum of eigenvalues, if the matrices commute. It is known from linear algebra that for matrices A and B, AB = BA if and only if A and B share a system of common eigenvectors. This is almost obvious for symmetric matrices, we can make use of eigenvalue decomposition: A = Q T DQ, B = Q T CQ, then A + B = Q T (D + C)Q if the matrices commute. We claim that J kr, (J r J r J r ) and I commute. I is easy, since it commutes with every matrix. It is also straightforward to verify that the vector of all ones, and everything orthogonal to it, can be taken as eigenvectors for J kr. Vector of all ones is also an eigenvector of (J r J r J r ); thus take any other set of eigenvectors for it, its members are orthogonal to 1, and so are also eigenvectors of J kr, and this proves the claim. The table below summarizes the eigenvalues of the 3 terms in the sum (6): J kr (1 α)(j r J r J r ) (1 α)i kr (1 α)r 1 α 0 0 1 α......... 0 0 1 α 0 (1 α)r 1 α 0 0 1 α......... 0 0 1 α 0 (1 α)r 1 α 0 0 1 α......... 0 0 1 α Thus in order to complete the proof, we need to show that there exists an α that satisfies: kr (1 α)r + (1 α) k 1 α (1 α)r k 1 α k 10

Solving the first inequality for α we obtain α 1 k. Solving the last one we get α 1 k. Setting α = 1 k will satisfy all three inequalities. As we have just shown, this theorem proves a bound on a very hard integer program (IP) using results obtained with semidefinite programming (SDP) analysis. Later we will show that any IP can be relaxed to an SDP in a similar way. SDP relaxations are typically tighter than relaxations to linear programs, and we will give a general framework for creating such relaxations. 4 Computing Sums of Eigenvalues Consider further generalization of the optimization problem we have been working with. As before, define A(x) = A 0 + x 1 A 1 + x 2 A 2 + + x m A m. But now, we want to find the sum of a given number of largest eigenvalues, instead of just finding one largest eigenvalue: f(x 1, x 2,, x n ) = (λ 1 + λ 2 + + λ k )(A 0 + x 1 A 1 + x 2 A 2 + + x m A m ). And, as in previous sections, we would like to find the unconstrained minimum: min x f(x 1, x 2,, x n ). We can write an SDP for this problem, but the straightforward approach (writing constraints for all possible sums of eigenvalues) results in an exponential number of constraints. A more sophisticated approach should be used. We will first illustrate the ideas we are going to employ on a small linear program. Example 6 (Linear Programming) Consider the following linear program (LP): max w 1 x 1 + + w n x n s.t. x1 + + x n = k (7) 0 x i 1 Suppose k is an integer. This program finds the k largest values of w, and sets corresponding values of x to 1. All extreme points of feasible set of LP are integer. This can be proven, for example, using complementary slackness. Let us write the dual of this problem: min s.t. kz + y i z + y i w i y i 0 Now, using complementary slackness, we can show that a fractional solution is either non-optimal, or is not an extreme point. 11

Another way to show that all the extreme points of LP are integral is to notice that the matrix of this problem is totally unimodular (all determinants of its sub-matrices are either 0, 1, or 1). Either way, it can be shown that the extreme points of this polytope are 0-1 vectors having exactly k ones in them. Introducing just one constraint on the sum (equation (7) in LP) greatly restricts the degrees of freedom for x. 12