CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming (SDP) We describe SDP as a generalization of linear programming (LP), and how an LP and other problems can be written as an SDP We present vector programming (VP) as another form of SDP and discuss duality in the SDP context Finally, we note some important differences between LP and SDP 1 Components of Semidefinite Programming Today we go beyond LP into semidefinite programming (SDP) The development of SDP was one of the major successes of the 1990 s It is a special case of convex optimization with many useful properties 11 LP and Closed Cones Recall that in standard form, and LP may be written as min c, x such that a i, x = b i, i = 1 m x 0 If we define K = (R + ) n = {x x 0} then we can rewrite the last condition above as x K Definition 11 A set K R n is a closed cone if x, y K and α, β R + and K is closed αx + βy K Lecture Notes for a course given by Avner Magen, Dept of Computer Science, University of Toronto 1

12 Positive Semidefinite Matrices Definition 12 An n n matrix X is positive semidefinite (PSD) if a R n, a t Xa 0 We can denote the set of all symmetric n n PSD matrices as P SD n P SD n = {X X is an n n symmetric matrix, X is PSD} Claim 13 P SD n is a closed clone Proof To see that P SD n is closed under positive addition, consider that X, Y P SD n and α, β R + a t (αx + βy )a = αa t Xa + βa t Y a 0 To show that P SD n to be closed, it is sufficient to show that its compliment P SD n is open For every X P SD n and any Y R n n (within finite elements), there exists a R n and ε, k > 0 sufficiently small that a t Xa + ε < 0 ka t Y a < ε a t (X + ky )a < 0 This illustrates that for any X P SD n any sufficiently small perturbation is also in P SD n Therefore, P SD n is open and its compliment P SD n is closed In the LP formulation, we require x (R + ) n In the next section, we describe semidefinite programs in which the cone (R + ) n is replaced by P SD n For now, we concentrate on the algebraic properties of P SD n Claim 14 For a symmetric matrix A, the following are equivalent (i) A is P SD (ii) the eigenvalues of A are non-negative (iii) A can be written as V t V (V is an n n matrix) Proof To show (iii) (i), consider that, given (iii), V R n n, x R n, x t Ax = x t V t V x = V x 2 2 0 For (i) (ii), let λ R be an eigenvalue of A and x R n the corresponding eigenvector, such that Ax = λx Then A P SD n, λ x 2 2 = xt λx = x t Ax 0 To demonstrate (ii) (iii), we employ the fact that if A is an n n symmetric matrix, it has a decomposition 2

A = U t DU where U and D are n n matrices, U is orthogonal, D is diagonal and the diagonal elements of D are the eigenvalues of A If all eigenvalues of A are non-negative, D R n n, so A = U t DU = (U t D t )( DU) = V t V Remark 15 To sketch how we can decompose A, we note that a symmetric matrix is made diagonal using symmetric row and column operations One row operation can remove a term below the main diagonal while the symmetric column operation removes the corresponding above-diagonal term 1 2 2 7 R 2 = R 2 2R 1 = 1 2 0 3 C2 = C 2 2C 1 = 1 0 0 3 These row and column operations are equivalent to modifying I left and right and applying to A 1 0 0 1 2 0 2 1 0 0 A 0 1 0 0 1 1 Remark 16 (i) (iii) says that A is PSD if and only if v 1,, v n R n so that a ij = v i, v j v 1 A P SD A = (V t )(V ) = v 1 v n v n = ( v i, v j ) ij So each element a ij encodes the dot product of a pair of vectors, and A as a whole can be thought of as encoding the relationships among a set of vectors Note that V is not unique We can rotate or reflect the set of vectors without changing the inner products Remark 17 For x (R + ) n, all elements of x are non-negative X P SD n does not place such a constraint on the individual elements Instead, from (i) (ii), we see that the eigenvalues of the matrix as a whole are non-negative 3

Example 18 Some instances of PSD matrices A with the corresponding V and v i (i) Let a 1,, a n 0, A = diag(a 1,, a n ) A = a 1 0 0 a n V = a1 0 0 an v i = 0 ai 0 (ii) A = ( 2 1 1 1 ) V = ( 1 1 1 0 ) ( 1 v 1 = 1 ) ( 1 v 2 = 0 ) 13 Matrix Inner Product (Frobenius Inner Product) We would like to view matrices as vectors when we apply linear constraints, and optimize linear functions on them The matrix inner product is SDP s counterpart to the vector inner product in LP Definition 19 The matrix inner product of A and B is A B = i,j a ij b ij = trace(ab t ) Notice that if A and B are thought of simply as reshaped vectors, this definition is equivalent to the usual vector inner product 2 Writing Semidefinite Programs An Semidefinite Program (SDP) has the form min C X A i X = b i, X 0 such that i = 1 m where X, C and each A i are n n matrices Whereas in LP we had n variables, in SDP the objective, constraints and solution matrices have n 2 elements Note that we may also write SDPs where the objective is to maximize C X The last condition, X 0, is an order relation in which 0 represents the n n zero matrix X 0 means that X P SD n X B B X (X B) 0 x R n, x t (X B)x 0 Next we study some examples of semidefinite programs 4

Example 21 X is n n where n = 3, and there are m = 2 constraints C = 1 1 1 1 1 1 A 1 = 1 0 1 0 0 7 A 2 = 0 2 0 2 6 0 b 1 = 0 b 1 1 1 1 7 0 0 0 4 2 = 2 Writing the problem more compactly in terms of the elements of X = (x ij ), we have min i,j x ij x 11 + 2x 13 + 14x 23 = 0 4x 12 + 6x 22 + 4x 33 = 2 X P SD n Example 22 Given the following incomplete matrix, 3 4? 5????? We want to complete to a PSD matrix while minimizing the sum of entries Representing the solution matrix as X = (x ij ), we write Expressed in matrix form, min i,j x ij st x 11 = 5, x 21 = 3, x 22 = 4 X P SD n 1 0 0 X = 5, min 1 1 1 1 1 1 1 1 1 1 0 0 X 0 X X = 3, st 0 1 0 X = 4 5

Observation 23 LP is a special case of SDP It is enough to consider the standard form: LP = SDP min c, x = min c 1 0 X x has n elements = i = 1 m, a i, x = b i = i = 1 m, 0 c n i j, X ij = 0 (by inner product constraints) a (1) i 0 0 a (n) i X = b i While we would not normally choose to write an LP in this form due to its inefficiency, it is useful to keep in mind that SDP is an extension of LP Unfortunately, as we shall see later, not all the nice properties of LP are maintained in the extension Example 24 Suppose you have two sets of points in R n P = {p 1,, p r }, and Q = {q 1,, q s } In Figure 1, points in P are represented by o s and points in Q are represented by x s We wish to find an ellipse, centred at the origin, that includes all p i P and excludes all q i Q (where we allow q i to lie on the boundary) Recall that an ellipse centred at the origin is {x x t Ax 1} for some A P SD n Here we use the n n matrix X P SD n to represent the ellipse, p i P, p t i Xp i 1 q i Q, q t i Xq i 1 We write the ellipse constraints (1) as matrix inner products using, p t Xp = (pp t ) X, where (pp t ) is the outer product (which also happens to be PSD) Finally, we add slack variables s pi and s qi to transform each constraint into an equality (1) p i P, p t i Xp i 1 (p i p t i ) X 1 (p ip t i ) X + s p i = 1 q i Q, q t i Xq i 1 (q i q t i ) X 1 (q iq t i ) X s q i = 1 We can incorporate the slack variables by writing the augmented matrix X as a blockdiagonal X = X 0 0 0 S P 0 0 0 S Q S P = s P1 0 0 s Pr S Q = s Q1 0 0 s Qs 6

Figure 1: An ellipse centred at the origin, separating two sets of points Figure 2: A feasible region bounded by an infinite family of linear constraints Then p i P we have a constraint in matrix form: p ip t i 0 0 0 E i 0 X = 1 where E i is an n n matrix with zeros everywhere except the i th position along the diagonal which is one Similar constraints may be written q i Q Note that the final condition X 0 ensures simultaneously that X 0 and that S P, S Q 0 This in turn ensures that all slack variables are non-negative Remark 25 In the example 24, we rewrote the PSD property in terms of the outer product of a vector with itself and the matrix inner product, X P SD a, a t Xa 0 a, (aa t ) X 0 For a specific vector, a, the constraint (aa t ) X 0 acts like a single LP constraint, a hyperplane in R n n The constraint holds for any a R n, so aa t defines an infinite family of linear constraints Thus in a sense X satisfies infinite constraints, and the region of feasible solutions to an SDP is bounded by an infinite set of hyperplanes, as in Figure 2, for instance 7

3 Vector Programming Recall X P SD n iff v 1,, v n R n st x ij = v i, v j That is, instead of finding the best matrix X that satisfies a given set of constraints, we can think of finding an optimal set of vectors V = {v 1,, v n } with constraints on the vector inner products We write a vector program as k = 1 m min i,j c ij v i, v j i,j a(k) ij v i, v j = b k v 1,, v n R n st Observation 31 This is identical to an SDP, just written in a different form It is clear that a feasible solution here translates to a feasible SDP solution, by X = (x ij ) = ( v i, v j ) and vice versa, with C = (c ij ) and A k = (a k ij ) Definition 32 A metric d on n points {1,, n} is a function d ij st d ij 0 (non-negative) d ij = d ji (symmetry) d ii = 0 d ij + d jk d ik (triangle inequality) Note that d can be represented as an n n symmetric matrix Definition 33 A metric d on n points is Euclidean if p 1,, p n R n st p i p j 2 = d ij (2) The set {p 1,, p n } is referred to as the Euclidean embedding of d Observation 34 Not every metric is Euclidean Proof Consider Figure 3, for instance Suppose {p A, p B, p C, p D } is the Euclidean embedding Since p A p B 2 + p B p C 2 = d AB + d BC = d AC = p A p C 2 p A, p B and p C must lie on a line with p B at the midpoint Similarly, p A, p B and p D must also lie on a line with p B at the midpoint This collapses p C and p D, implying (p c p d ) 2 = 0 However, this is inconsistent with d CD = 2 We relax (2) and define the cost of the embedding as the minimum c such that i, j {1,, n}, d ij p i p j 2 cd ij 8

Figure 3: An example of a metric on four points that is not Euclidean The inequalities are preserved when squaring all terms, d 2 ij p i p j 2 c2 d 2 2 ij Using the connection between the Euclidean norm and inner product, p i p j 2 2 = p i p j, p i p j = p i, p i + p j, p j + 2 p i, p j In other words, taking C = c 2, we can formulate the problem of finding the minimum cost embedding as a VP instance min C, st i, j {1,, n}, d 2 ij p i, p i + p j, p j 2 p i, p j Cd 2 ij Observation 35 We can t hope to add a dimension constraint to the above, for example to ask for the vectors to be of dimension 5 Vector programming with restrictions on dimension is NP-hard Example 36 Consider the Vertex cover problem Given a graph G = (V, E), find the smallest set of vertices S V such that for all edges {i, j} E either vertex i S or v j S (or both) Suppose we have a VP in which the vectors are restricted to dimension one We have variables x i for vertices i V, and x 0 (an indicator value) min i x 0, x i x i 2 = 1 i, j E, x i, x 0 + x j, x 0 0 x i R 1 Claim 37 A solution to the above translates to an exact solution to vertex cover, where S = {v i x i = x 0 } and i x 0, x i = 2 S n 9

4 Duality in SDP As in LP, we should think of the dual as a linear method that bounds the optimimum To bound C X from below, we want to: (i) Find coefficients y i for each of the m constraints (ii) Guarantee that X 0, it holds that m y i A i X C X (3) i=1 Having done so, since y i A i = y i b i, we get a lower bound on the objective of the primal: C X y, b What restrictions must we put on y so that (3) holds? Rewrite (3) as X 0, (C m y i A i ) X 0 i=1 In particular, for x R n, we have that xx t 0 and so C y i A i must satisfy x R n, (C y i A i ) (xx t ) = x t (C y i A i )x 0 Therefore, to satisfy (3), (C y i A i ) must be P SD On the other hand, if X 0 then w 1 w i wi, t where V = X = V t V = i w n That is, every X 0 can be represented as the non-negative sum of outer-products and so, if (C m i=1 y ia i ) satisfies (3) for every xx t, it satisfies it for every X 0 Definition 41 Let K be a cone The dual cone K is defined K = {y x, y 0, x K} In formulating the dual of LP, we use ((R + ) n ) = (R + ) n We now need to understand the dual of P SD n Claim 42 P SD n = P SD n We write the dual as Primal Dual min C X max y, b i = 1 m, A i X = b i yi A i C 0 X 0 y 0 Note that the constraint y i A i C 0 is equivalent to y i A i C 10

5 Differences between LP and SDP Many of the nice LP properties do not carry over for SDP The optimum cannot always be attained, even if the problem is bounded Even if the optimum can be attained, it may be irrational, and so can t be solved exactly on a finite machine There is no analogue in SDP for LP s Basic Feasible Solutions (BFS) There can be a duality gap See theorem 51, below Strong duality does not neccessarily hold Implied by the above, there is no natural bound on the size of the solution in terms of input size Theorem 51 If there is a feasible solution X 0 (all eigenvalues strictly positive) with C y i A i 0 then both primal and dual attain optimum and they are the same 11