Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2 Instructor: Farid Alizadeh Scribe: Xuan Li 9/17/2001 1 Overview We survey the basic notions of cones and cone-lp and give several examples mostly related to semidefinite programming. 2 Program Formulations The linear and semidefinite programming problems are formulated as follows: 2.1 Standard Form Linear Programming Let c R n and b R m,a R n m with rows a i R n, i = 1,...m. min: c T x s.t. a i x = b i, i = 1,..m x 0 (1) 2.2 Semidefinite Programming Here instead of vectors a i we use symmetric matrices A i S n n (the set of n n symmetric matrices), i = 1,...m, C S n n and X S n n instead of c and x. The matrix X is positive semidefinite. The inner product is defined as A B = i,j A ij B ij = Trace(AB T ) = Tr(AB) = Tr(BA). 1

The second equation is from definition of product, and the last one come from the observation that even though matrix product is not commutative, i.e. AB BA in general, the diagonal entries of AB and BA are equal and thus their traces are equal as well. The standard form of semidefinite programming is : min C X s.t. A i X = b i, i = 1,..m X 0 3 Some Notations and Definitions cone: A set K is called a cone if αx K for each x K and for each α 0. convex Cone: A convex cone K is a cone with the additional property that x + y K for each x, y K. pointed cone A pointed cone K is a cone with the property that K ( K) = {0}. open Set A set S is open if for every point s S, B(a, ɛ) = {x : x s < ɛ} S for some positive number ɛ s. closed set A set S is a closed set if its compliment S c is open. interior of set The interior of a set S is defined as Int(S) := T T S,Topen closure of set The closure of a set S is defined as cl(s) := T T S,Tclosed boundary of set The boundary of a set S is defined as Bd(S) := Cl(S) Int(S) c Remark 1 There are some basic facts which can be easily seen from the definitions above: 2

1. An open set in R n is not open in R m for n < m ; 2. similarly, the boundary or the interior of a set isn t the same in R n as in R m ; 3. As a result one talks about an open set with respect to the topology induced by the vector space spanned by a set S; 4. similarly we speak of relative interior and relative boundary of a set which are understood to be with respected to topology of the space spanned by the the set; 5. a closed set in R n is also closed in R m. Consider the half closed interval [a, b) = {x : a a < b} in R 1. The interior of [a, b) in R 1 is the open interval (a, b) and the boundary of [a, b) is {a} {b}. But (a, b) isn t open in R 2 since for any x (a, b), we can t find some ɛ > 0 such that B(x, ɛ) (a, b). The interior of [a, b) in R 2 is empty and the boundary of [a, b) in R 2 is [a, b]. However the relative interior of [a, b) in R n is again (a, b) and the relative boundary {a, b}. Definition 1 (Proper Cone) A proper cone K R n is a closed, pointed, convex and full- dimensional cone (i.e dim(k) = n). A full-dimensional cone is a cone which contains n linearly independent vectors. Theorem 1 Every proper cone K induces a partial order which is defined as follows: x, y R n, x K y x y K X K > y x y Int(K) Proof: First note that x K x since x x = 0 K. Secondly, ifx K y, y K x, then x y K, y x K. Since K is a proper cone, thus a pointed cone, we get x = y. Finally, if x K y, y K z then x z = (x y) + (y z) K, i.e., x K z. 4 The Standard cone linear programming (K- LP) min c T x s.t. a T i x = b i, i = 1,..m x K 0 where c R n and b R m,a R n m with rows a i R n, i = 1,...m. Observe that every convex optimization problem: min x C f(x) where C is a convex set 3

scribe:xuan Li and f(x) is convex over C, can be turned into a cone-lp. First turn the problem to one with linear objective and then turn it into Cone LP: min z s.t. f(x) z 0 x C. Since the set C = {(z, x) x C and f(x) z 0} is convex our problem is now equivalent to the cone LP where min z s.t. x 0 = 1 x K 0 where K = {(x 0, z, x) (z, x) x 0 C and x 0 0} The convex set embeded in plane and turned into a cone Definition 2 (Dual Cone) The dual cone K of a proper cone is the set {z : z T x 0, x K}. It is easy to prove that if K is proper so is K. 4

Example 1 (Half line) Let R + = {x : x 0}. The dual cone R + is exactly R +. Example 2 (non-negative orthant) Let R n + = {x x k 0 for k = 1,..., n}, the dual cone equals R n +, that is the non-negative orthant is self dual. We recall that Lemma 1 A matrix X is positive semidefinite if it satisfies any one of the following equivalent conditions: 1. 2. 3. (1) a T Xa 0, a R n (2) A R n n such that AA T = X (3) All eigenvalues of X are non-negative. Example 3 (The semidefinite cone) Let P n n = {X R n n : X is positive semidefinite} Now we are interested in P n n. On one side, i.e., Z P n n, Z X 0 for allx 0, Z X = Tr(ZX) = Tr(ZAA T ) = Tr(A T ZA) 0 for all A R n n. Since X is symmetric, from the knowledge of linear algebra, X can be written as X = QΛQ T where QQ T = I, that is Q is an orthogonal matrix, and Λ is diagonal with the diagonal entries containing the eigenvalues of X. Write Q = [q 1,...q n ] and Λ = diag(λ 1,...λ n ). λ i, i = 1..n, then q i is the eigenvector corresponding to λ i, i.e, q T i Xq i = λ i Let us choose A i = p i R n where p i is the eigen vector of Z corresponding to γ i and p T i p i = 1. Then, 0 Tr(A T i ZA i ) = p T i Zp i = γ i. So all the eigenvalues of Z are non-negative, i.e., Z P n n, P n n P n n. On the other hand, Y P n n, B R n n such that Y = BB T. X P n n, X = AA T, we have Y X = Tr(YX) = Tr(BB T AA T ) = Tr(A T BB T A) = Tr[(B T A) T (B T A)] 0 i.e., Y P n n, P n n P n n. In conclusion, P n n = P n n 5

Example 4 (The second order cone) Let Q = {(x 0, x) x 0 x }. Q is a proper cone. What is Q? On one side, if z = (z 0, z) Q, then for every (x 0, x) Q ( ) (z 0, z T x0 ) = z x 0 x 0 + z T x z x + z T x z T x + z T x = 0 i.e., Q Q. The inequalities come from the Cauchy-Schwartz inequality: z T x x T z z x On the other side, we note that e = (1, 0) Q. For each element z = (z 0, z) Q we must have z T e = z 0 0. We also note that each vector of the form x = ( z, z) Q, for all z R n. Thus, in particular for z = (z 0, z) Q, z T x = z 0 z z 2 0 Since z is always non-negative, we get z 0 z, i.e., Q Q. Therefore, Q = Q Definition 3 An extreme ray of proper cone K is a half line αx = {αx α 0} for x K such that for each a αx, if a = b + c, then b, c αx. Example 5 (Extreme rays of the second order cone) Let Q the second order cone. The vectors x = ( x, x ) define the extreme rays of Q. This is fairly easy to prove. Example 6 (Extreme rays of the semidefinite cone) Let P n n be the semidefinite cone. Positive semi-definite matrices qq T of rank 1 form the extreme rays of P n n. Here is the proof. Any positive semidefinite matrix X can be written in the form of X = i λ ip i p T i (See previous lecture to see how to get this from spectral decomposition of X). This shows that all extreme rays must be among matrices of the form qq T. Now we must show that each qq T is an extreme ray. Let qq T = X+Y, where X, Y 0. Suppose {q 1 = q, q 2,..., q n } is an orthogonal set of vectors in R n. Then multiplying from left by q T i and from right by q i we see that q T i Xq i + q T i Yq i = 0 for i = 2,..., n; but since the summands are both non-negative and add up to zero, they are both zero. Thus q T i Xq i = q T i Yq i = 0 for i = 2,... n. Thus both X and Y are rank one matrices (their null space has dimension n 1) and we might as well write qq T = xx T + yy T. But the right hand side is a rank 2 matrix unless x and y are proportional, which proves they are proportional to q. Thus, qq T are extreme rays for each vector q R n. 6

4.1 An Example of a cone which is not self dual In the examples above, we note that they were all self-dual cones. But there are cones that are not self-dual. Let F be the set of functions F : R R with the following properties: 1. F is right continuous, 2. non-decreasing (i.e. if x > y then F(x) F(y),) and 3. has bounded variation, that is F(x) α > as x, and F(x) β < as x. First observe that functions in F are almost like probability distribution functions, except that their range is the interval [α, β] rather than [0, 1]. Second the set F itself is a convex cone and in fact pointed cone in the space of continuous functions. Now we define a particular kind of Moment cone. First, let us define u x = The moment cone is defined as: { M n+1 = c = 1 x x 2 x n. } u x df(x) : F(x) F that is M n+1 consits of vectors c where for each j = 0,..., n, c j is the j th moment of a distribution times a non-negative constant. Lemma 2 M n+1 is a proper cone. Proof: Let s examine the properties we need to prove: c M n+1 and α 0 αc M n+1. To see this observe that there exists F F such that c = u x df(x). Now if F is right-continuous, nondecreasing and with bounded variation, then all these properties also hold for αf for each α 0 and thus αf F. Therefore, αc = u x d(αf(x)) M n+1. Thus M n+1 is a cone. If c and d are in M n+1 then c + d M n+1. c = u x df 1 (x) M n+1, d = u x df 2 (x) M n+1 c + d = u x d[f 1 (x) + F 2 (x)] M n+1 Thus M n+1 is a convex cone. 7

If c and c are in M n+1 then c = 0. Ifc = u x df 1 (x) M n+1 and c M n+1, then c = u x df 2 (x) M n+1. c + ( c) = 0 = u x d[f 1 (x) + F 2 (x)] Especially, d[f 1 (x)+f 2 (x)] = 0. Since F 1 (x)+f 2 (x) F is non-decreasing with F 1 (x) + F 2 (x) 0 as x, we get F 1 (x) + F 2 (x) = 0 almost everywhere,i.e., F i (x) = 0, i = 1, 2 almost everywhere. It means c = 0, i.e., M n+1 M n+1 = 0. Thus M n+1 is a pointed cone. M n+1 is full-dimensional. Let F a (x) = { 0, if x < a 1, if x a Obviously, F a (x) F and u a = u x df a (x) M n+1 for all a R. Choose n + 1 distinct a 1,...a n+1, det[u a1,, u an+1 ] = i>j(a i a j ) 0 Thus M n+1 is full-dimension cone. (The determinant above is the wellknown Vander Monde determinant.) In addition we need to show that M n+1 is closed. future lectures. This will be taken up in Example 7 (Extreme rays of M n+1 ) The extreme rays of M n+1 are all αu x for x R. If c M n+1, c can be written as α 1 u x1 + α 2 u x2 + + α n+1 u xn+1, α i 0 for i = 1,..n + 1. There is a one-to-one correspondence between c M n+1 and H = α 1 u x1 u T x 1 + α 2 u x2 u T x 2 + + α n+1 u xn+1 u T x n+1. Such a matrix is called Hankel matrix. In general Hankel matrices are thos matrices, H such that H ij = h i+j, that is entries are constant along all opposite diagonals. A vector c R 2n+1 is in the moment cone if and only if the Hankel matrix H ij = c i+j is positive semidefinite. Again these assertions will be proved in future lectures. Now we examine M n+1. Let s first consider the cone defined as follows: P n+1 = {p = (p 0,..., p n ) p 0 + p 1 x + p 2 x 2 +... + p n x n = p(x) 0 for all x} Lemma 3 Every non-negative polynomial is the sum of square polynomials. 8

Proof: First it is well known that p(x) can be written as { k [ p(x) = c (x αj iβ j )(x α j + iβ j ) ]}{ n j=1 j=k+1 } (x γ j ) where i = 1 and c 0. We first claim that n must be even. Otherwise, p(x) as x p(x) and cannot be non-negative. The number of real roots is even subsequently, say 2l. since p(x) 0, all the real roots must have even multiplicity, because otherwise in the neighborhood of the root with odd multiplicity there is some t such that p(t) < 0. Thus, we can write { k [ p(x) = c (x αj iβ j )(x α j + iβ j ) ]}{ n (x γ j ) 2} j=1 j=k+1 On the other hand for each pair of conjugate complex roots we have (x α iβ)(x α + iβ) = (x α) 2 + β 2 Therefore the product expression for p(x) is product of square polynomials or sums of square polynomials, which yields a sum of square polynomials. This means that the set of extreme rays of the non-negative polynomials is among polynomials that are square q 2 (x). Thus, the coefficients of extreme rays are of the form q q = q 2, where a b is the convolution of vectors a and b, that is for a, b R n+1, a b R 2n+1 and is defined as: a b = (a 0 b 0, a 0 b 1 + a 1 b 0,..., a 0 b k + a 1 b k 1 + + a k b 0,..., a n b n ) T and q 2 = q q. Now not all square polynomials are extreme rays. In particular, if a square polynomial has non-real roots then it can be written as sum of two square polynomials as shown above. Thus, extreme rays are among those square polynomials with only real roots. We now argue that these polynomials are indeed extreme rays. 9

Suppose p(x) = (x γ j ) 2k is a polynomial with distinct roots γ j which is not an extreme ray. Then p(x) = q(x) + r(x) and since both q and r are non-negative, we must have q(x) p(x). This means that degree of q(x) is at most as large as degree of p. Furthermore, from the picture it is clear that each γ j is also a root of q(x). But if for some γ j the multiplicity in p is 2k and the multiplicity in q is 2m where m < k then in some neigborhood of γ j q(x) > p(x) because (x γ j ) 2m > (x γ j ) 2k in some neighborhood of γ j when m < k; therefore, k m for each root. Since degree of p is larger than or equal to degree of q it follows that k = m for each root. Thus q(x) = αp(x) for some constant α. We have proved: Corollary 1 p is an extreme ray of P n+1 if p = q 2 and q(x) has only real roots. P n+1 since We now show that P n+1 M n+1. Note that c = n+1 i=1 β iu xi M n+1, i p 2 i ( i n+1 ) i ) T ( β j u xj p 2 j=1 = i,j β j [ (p 2 i ) T (u xj ) ] 0. β i 0, [ (p 2 i ) T (u xj ) ] = [ p i (x) ] 2 Later in the course we will prove that that P n+1 = M n+1. 10