Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4 Instructor: Farid Alizadeh Scribe: Haengju Lee 10/1/2001 1 Overview We examine the dual of the Fermat-Weber Problem. Next we will study optimality condition in the form of generalized complementary slackness theorem. Finally we start the study of the eigenvalue optimization problem as a semidefinite program. 2 of the Fermat Weber Problem Recall that the Fermat-Weber problem seeks a point in m dimensional space whose Euclidean distance from a set of given n points is minimum (see lecture 1). Given points v 1, v 2,..., v n R m, weights w 1, w 2,..., w n, this problem can be formulated as follows. n min w i v i x i=1 The problem can be written equivalently as a cone-lp over Q, the second order cone: min w 1 z 1 +... + w n z n z i v i x, i = 1,..., n. 1

But, where e = (1, 0,..., 0) T and ˆx = z i v i x ( ) zi x v Q 0 i ( ) ( ) zi 0 x Q vi ( ) 0 z i e + ˆx Q v i ( 0 x). Now the cone-lp formulation is: Primal min w 1 z 1 +... + ( w n ) z n 0 z i e + ˆx Q, i = 1,..., n v i ( yi0 ( yi0 y i ) ) If we define dual variable corresponding to the the second order cone y i inequality in the Primal then the dual can be formulated as: max n i=1 vt i y i y 0i = w i, i = 1,..., n z i (y0 y 1 + )... + y n = 0 x i i y Q 0 since they arise from Q. i After simplification (for instance eliminating y i0 ) we get: max n i=1 vt i y i n i=1 y i = 0 y i w i. The dual of Fermat Weber problem has an interesting interpretation in dynamics. Let us assume that w i are weights of objects hanging from threads that go through a set of holes in a table. We are to take the other ends of the threads and tie them up at a position of equilibrium, and spend minimal amount of energy. Then the y i are interpreted as forces, and they must add up to zero so that we have equilibrium. The condition y i w i simply states that the magnitude of the force exerted at the knot by the i th object cannot be larger than its weight. Assuming that the optimal location is x we can write the value of the objective function as i (x v i ) T y i because (x ) T i y i = 0. Then the objective is simply the location with minimum potential energy. (Question: 2

Can you give an interpretation of Primal and explain why the primal and dual problem are equal at the optimum?) w1 w2 w3 w4 w5 3 ity in different spaces In many situations a m-dimensional cone in can be expressed as the intersection of another n-dimensional cone and a linear space: K 1 = K L where n > m. Then, remembering that a linear space is also a cone and its dual as a cone is simply its orthogonal complement L (why?), we get K1 = K + L. Here K1 is the dual of K 1 in the space R n. But if we can get the dual in the space L then the dual cone will be m-dimensional and different from K1; let us call the dual of K 1 in the space L K 1 +. If it is at all possible to find a good characterization of K 1 + we should use that instead of K 1. Let us look at an example and see what would the problems be if we don t. In linear programming our cone is the non-negative orthant R n + and cone-lp is simply the ordinary LP: Primal min c T x a T i x = b i i = 1,..., m x i 0 i = 1,..., n max b T y i y ia i + s i = c i i = 1,..., m s i 0 i = 1,..., n Now suppose that we express the non-negative orthant as the intersection of positive semidefinite cone and the linear space L which consists of only diagonal 3

matrices, that is X L iff x ij = 0 for all i j. We define diagonal matrices C = Diag(c) and A i = Diag(a i ), that is a matrix whose diagonal entries j, j are c j (or (a i ) j ), and non-diagonal entries i, j are all zeros. Now the primal linear programming problem can be written as a semidefinite programming problem. Primal : min{c X A i X = b i, for i = 1,..., m, X ij = 0 for i j, X 0} Note that the condition X ij = 0 is the same as (E ij + E ji ) X = 0 where E ij is the matrix with all entries 0 except the i, j entry which is one. Now taking the dual of this SDP we arrive at a problem that is not equivalent to the dual of the LP: : max{b T y y i A i + s ij (E ij + E ji ) C} Even if the original LP problem has unique primal and dual solutions it is unlikely in general that the dual of the SDP formulation have unique solutions. The constraints in the dual imply that y i a i c but there are in general infinitely many s ij that can be added to a set a given optimal y. The lesson is that it is not a good idea to formulate an LP as an SDP (which was obvious at the outset). But for the same reason it is not generally a good idea to express the dual of a cone-lp over K 1 L as K 1 + L. As another example consider the second order cone Q. Now we know that x Q iff Arw x 0. Thus again SOCP can be expressed as an SDP: write Q = P n n L where L is the linear space saying matrix X is arrow shaped, i.e. X ij = 0 if i j and i 0 and j 0, and X ii = X jj for all i, j. But again formulating SOCP as and SDP is not a good idea. If we form the dual as an SDP we will have extra and unnecessary variables that play no essential role and can make the solution numerically unstable, even if the original SOCP does not have numerical problems. In future lectures we will see even more compelling reasons why the SOCP poblem should be treated in its own right rather than as a special case of SDP. 4 Generalization of Complementary Slackness Conditions Consider the pair of cone-lp problems Primal min c T x Ax = b x K 0 max b T y A T y + s = c s K 0. We studied before that at the optimum the following three relations hold: x K 0 s K 0 and x T s = 0. In the case of LP, SDP and SOCP these conditions actually imply stronger relations which we now examine. 4

Example 1 (non-negative orthant) When K = K = R n +, at the optimum, x i 0 for i = 1,..., n, s i 0 for i = 1,..., n, and x T s = 0 imply x i s i = 0 for i = 1,..., n because sum of a set of non negative numbers x i s i is zero implies that each of them must be zero. This is the familiar complementary slackness theorem of linear programming. Example 2 (the semidefinite cone) When K = K = P n n the optimal, X 0, S 0, and X S = tr(xs) = 0. Since the matrix S is symmetric S can be expressed as S = Q T ΩQ = Q T ΩQQ T ΩQ = S 1/2 S 1/2, where Q is an orthogonal matrix, and Ω a diagonal matrix containing eigenvalues of S on its diagonal. This shows that each positive semidefinite matrix has a unique positive semidefinite square root which is denoted by S 1/2. Now, 0 = tr(xs) = tr ( XS 1/2 S 1/2) = tr ( S 1/2 XS 1/2) This implies that S 1/2 XS 1/2 = 0 because S 1/2 XS 1/2 is also a positive semidefinite matrix, with non-negative eigenvalues and trace zero. Since trace is sum of eigenvalues, this is possible only when all eigenvalues are zero, which, in the case of symmetric matrices, implies that the matrix S 1/2 XS 1/2 is zero. Thus 0 = (S 1/2 X 1/2 )(X 1/2 S 1/2 ) = A T A. We now that AA T = 0 iff A = 0, thus X 1/2 S 1/2 = 0 which implies XS = 0. We have shown: Theorem 1 (Complementary slackness theorem for SDP) If X is optimal for the primal SDP, and (y, S) optimal for the dual SDP, and duality gap X S = 0, then XS = 0. Example 3 (The second order cone) When K = K = Q, we have x Q 0, s Q 0 and x T s = 0, where x, s R n+1, and x and s are indexed from 0. This means that x 0 x, and s 0 s, and x T s = 0. or equivalently, x x 2 0 x 2 1 + + x 2 2 n x 0 s 0 i s 0 (1) x 0 s s 2 0 s 2 1 + + s 2 2 n x 0 s 0 i x 0 (2) s 0 x 0 s 0 = x 1 s 1 + + x n s n (3) Now, adding (1), (2) and (3) we get 0 ( x 2 i s 0 + s2 i x ) 0 + 2x i s i x 0 s 0 = ( x 2 i s2 0 + s 2 i x2 0 + 2x i s i x 0 s 0 x 0 s 0 0 (x i s 0 + s i x 0 ) 2 x 0 s 0 Again, sum of a set of non-negative numbers is less that or equal to zero. Therefore all of them must be zero. We thus have x i s 0 + x 0 s i = 0, i = 1,..., m and x T s = 0 ) 5

We have shown Theorem 2 (Complementary slackness for SOCP) If x Q 0, s Q 0, and x T s = 0, then x 0 s i + x i s 0 = 0 for i = 1,..., n. This conditions (along with x T s = 0) can be written more succinctly as Arw (x) Arw (s)e = 0 We have implicitly assumed that x 0 0 and s 0 0. if x 0 = 0 x then this implies that x = 0 and the theorem above is trivially true. The same holds for when s 0 = 0. 5 A general complementary slackness theorem For a proper cone K R n, define C(K) {( } x C(K) = x s) K 0, s K 0, x T s = 0 R 2n Now, on the surface, the set C(K) seems to be a (2n 1)-dimensional set: Its members have 2n coordinates and since x T s = 0 we are left with 2n 1 degrees of freedom. The condition x K by itself does not impose restriction on the dimension of the set, nor does the condition s K. Nevertheless it turns out C(K) is actually an n-dimensional set! Here is why: Theorem 3 There is a one-to-one and onto continuous mapping from C(K) to R n. Before we proceed to the proof we recall the following basic Fact 1 Let S R n be a closed convex set and a R n. Then there is a unique point x = Π S (a) in S which is closest to a, i.e. there is a unique point x S such that x = argmin y S a y. The unique point above is called projection of a on to S. The proof of this fact can be found in many texts and is based on Weierstrass s theorem. Now we give proof of Theorem 3. Proof: Let a R n be any arbitrary point and define s = x a. we will first show that s K, and then show that the correspondence between a and (x, s) is a one-to-one, onto and continuous. First we show that s K. For every u K, define convex combination u α = αu + (1 α)x where 0 α 1. Again we define ζ(α) = a u α 2. Then ζ(α) is a differentiable function on the interval [0, 1] and min 0 α 1 ζ α is attained at α = 0. Claim: dζ dα 0 α=0 6

proof of Claim: Otherwise α in some neighborhood of 0, such that a u α < a u 0 contradicting the fact x = u 0 is the closest point to a in K. From this claim, dζ dα = 2(a x) T (u x) 0 α=0 2(x a) T (u x) 0 2s T (u x) 0. (4) This latter inequality is true for any u K. If we choose u = 2x then we get s T x 0. If we choose u = x/2 then s T x 0. We conclude that x T s = 0. If we plug this into (4) we get s T u 0 which means s K. Thus, for each a we get a pair (x, s) C(K). Clearly each a results in a unique (x, s) as x the projection is unique and thus so is s = x a. Also, both projection operation and s = x a are continuous. Conversely, if (x, s) C(K), then we can set a = x s. All we have to show now is that projection of a onto K is x. Assume otherwise. Then there is a point u K such that a u < a x that is (a x) T (a x) > (a u) T (a u) x T x 2(x s) T x > u T u 2(x s) T u noting that x T s = 0, 0 > u T u + x T x 2x T u + 2u T s 0 > u x 2 + 2s T u which implies that s T u < 0, contradicting the fact s K. (This proof is due to Osman Güler.) Example 4 ( of half line) Let us see what C(K) looks like in the case of half-line, that is when K = K = R +. {( } x C(K) = x 0, s 0 R s) 2 In other words, C(R + ) is the union of non-negative part of the x and s axes: it is the real line R bent at the origin by a 90 angel. Now the implication of this theorem is that since C(K) is n-dimensional, then there must exist a set of n equations, that are independent in some sense and define the manifold C(K). These n equations are precisely the complementary slackness conditions. In case of non-negative orthant, semidefinite and second order cones we were able to get these equations explicitly. When the cone K is given by a set of inequalities of the form g i (x) 0 for i = 1,..., n, and g i (x) are homogeneous and convex functions, then the classical Karush-Kuhn-Tucker conditions gives us a method of obtaining these equations. 7

6 Eigenvalue Optimization In this section we relate the eigenvalues λ 1 (A) λ 2 (A) λ n (A) for some A S n n. Let us find an SDP formulation of the largest eigenvalue, λ 1 (A). This problem can be formulated by primal and dual SDPs as follows. Primal min z zi A max A Y I Y = tr(y ) = 1 Y 0 The primal formulation simply says find the smallest z such that z is larger than all eigenvalues of A. But z is larger than all eigenvalues of A iff zi A is positive semidefinite. The dual characterization is obtained by simply taking dual. Now define the feasible set of the dual to be S, that is Definition 1 S = {Y S n n tr(y ) = 1, Y 0} (5) E = {qq T q = 1} (6) We can characterize the extreme points of S as follows: Theorem 4 S is a convex set and the set of extreme points of S is E. Proof: Convexity of S is obvious, since it is the intersection of the semideinite cone and an affine set. Y 0 implies that Y = ω 1 q 1 q T 1 + + ω k q k q T k where ωi = 1, ω i 0, and q i = 1. This shows that the extreme points of S are among elements of E. Now we prove that all elements of E are extreme points. Otherwise for some qq T there are p and r with p = r = 1 and qq T = αpp T + (1 α)rr T = ( αp 1 αr ) ( αp 1 αr ) T. If α 0 or 1 we will have a contradiction to the fact that rank(qq T )=1. so qq T are extreme points. Since the optimum of a linear function over a convex set is attained at an extreme point, it follows that the Y that maximized A Y in the dual characterization above is of the form Y = qq T, with q = 1. That is λ 1 (A) = max q =1 qt Aq This is a well-know result in linear algebra that we have proved using duality of SDP. In future lectures we will use this characterization to express optimization of eigenvalues over an affine class of matrices. 8