Summer School: Semidefinite Optimization Christine Bachoc Université Bordeaux I, IMB Research Training Group Experimental and Constructive Algebra Haus Karrenberg, Sept. 3 - Sept. 7, 2012
Duality Theory
Convex cones Definition: Let E be a finite dimensional Euclidean space with inner product denoted (x, y). A convex cone K in E is a full dimensional subset of E such that: 1. For all λ 0, for all x K, λx K, 2. For all x, y K, x + y K. If moreover K does not contain pairs (x, x) with x 0, K is said to be pointed.
Convex cones: examples Example 1: The non negative orthant NO = {x R n : x i 0 for all i [n]}. It is a polyhedral cone. It is closed, convex, pointed. Notation: x 0 for x NO. Example 2: The second order cone Also called the ice cream cone. Closed, convex, pointed. SO = {(x, t) R n R 0 : x T x t 2 }
Convex cones: examples Example 3: The positive semidefinite cone S 0 n is the set of positive semidefinite matrices of size n. We recall the equivalence of: 1. A 0, i.e. A is symmetric and its eigenvalues are non negative 2. x T Ax 0 for all x R n 3. A = P T P for some P R k n (Cholesky decomposition). The non negative orthant and the second order cone are sections of : Indeed, NO can be identified with the subset of diagonal S n 0 matrices in S 0 n, and SO with the set of matrices of the form ( I n because ( ) In x X := x T t 2 0 t 2 x T x 0. ) x x T t 2
Dual cones Let K be a closed convex cone. Its dual K := {y E : (x, y) 0 for all x K } is also a closed convex cone. If moreover K is pointed and full dimensional then also K. We have (K ) = K. Proof: The inclusion K (K ) is obvious. The reverse inclusion (K ) K requires the separation lemma:
Dual cones Lemma: Let K be a closed convex cone, and let z / K. There exists a hyperplane H = (Ry) such that (x, y) 0 for all x K and (z, y) < 0. z K H = (Ry) Proof of K = (K ) : If K is strictly contained in (K ), there exists z (K ) such that z / K. Because z / K, applying the separation lemma, there exists y E such that (x, y) 0 for all x K and (z, y) < 0. But (x, y) 0 for all x K means that y K ; then (z, y) < 0 contradicts the assumption z (K ).
Dual cones The non negative orthant, the second order cone, and the cone of positive semidefinite matrices are self dual: K = K. The copositive cone COP = {A S n : x T Ax 0 for all x 0}. is closed, convex, pointed but not self dual. Its dual is the cone of completely positive matrices: POS = {A S n : A = l i=1 x i x T i for some l 1, x i R n 0}. (It is easy to prove that COP = POS. It is not so easy to prove that POS is closed, which is needed for COP = (POS ) = POS).
Conic linear programs Notations: E and F are finite dimensional Euclidean spaces. We take the same notation (x, y) for their inner products. K is a pointed, closed, convex, full dimensional cone in E. T : E F is a linear map, c E, b F. A conic linear program in primal form: p = sup{(c, x) : Tx = b, x K } We remark that p is the supremum of a linear form over the intersection of K with and affine subspace of E.
Terminology p = sup{(c, x) : Tx = b, x K } (c, x) is the objective function {x : Tx = b, x K } is the feasible region If x is in the feasible region, it is called a feasible point and (c, x) is called its optimal value. x is an optimal point if it is feasible and its optimal value is equal to p. x is strictly feasible if x is feasible and belongs to the interior of K. If the program in infeasible, ie the feasible region is empty, we set p =. On the other hand we can have p = + and the program is said to be unbounded.
The dual program Recall the primal program is: p = sup{(c, x) : Tx = b, x K } Then the dual program is: d = inf{(y, b) : T y c K }. where T : F E is the adjoint operator: (Tx, y) = (x, T y) for all x E, y F.
Equivalence of primal and dual forms p = sup{(c, x) : Tx = b, x K } d = inf{(y, b) : T y c K } These two programs look apparently different, in fact the two forms are equivalent. Indeed, d = inf{(y, b) : z = T y c, z K } and V := {T y c : y F } is an affine subspace of E. Moreover we can assume b Im(T ) (otherwise {x : Tx = b} is empty) say b = T γ. Then so that (y, b) = (y, T γ) = (T y, γ) = (z, γ) + c, γ) d = (c, γ) + inf{(z, γ) : z V K }. And the dual of d is of course p (exercise!).
Weak duality p = sup{(c, x) : Tx = b, x K } d = inf{(y, b) : T y c K } Theorem: Let x be primal feasible and y be dual feasible. then: 1. (weak duality) (c, x) (y, b), consequently p d. 2. (complementary slackness) If x is primal optimal, y is dual optimal and p = d, then (c, x) = (y, b) and (T y c, x) = 0. 3. (optimality criterion) If (c, x) = (y, b) or (T y c, x) = 0, then x is primal optimal, y is dual optimal and p = d. Proof: we compute the difference of the two objectives (c, x) (y, b) = (c, x) (y, Tx) = (c, x) (T y, x) = (T y c, x) 0.
Comments on weak duality The proof is very easy but it has many practical applications: From 1): (c, x) (y, b), any dual feasible point leads to an upper bound for p. Often p cannot be computed exactly but can be approximated this way. Complementary slackness: (T y c, x) = 0, gives useful information on the primal optimal points, e.g. if one wants to prove uniqueness. We say there is no duality gap or that strong duality holds if p = d. It is not always the case! We shall see later a sufficient condition (Slatter condition) for strong duality.
Semidefinite programming E = S n, A, B = Tr(AB), K = S n 0, K = K. F = R m, (a, b) = a T b; b R m, C S n. T is defined by m symmetric matrices A 1,..., A m : T : S n R m T : R m S n X ( A j, X ) 1 j m y m i=1 y ia i p = sup{ C, X : A j, X = b j (j [m]), X 0 } d = inf { m m b j y j : C + y j A j 0 } j=1 j=1
Block structure In many situations, C and A j have a common block structure: C = Diag(C 1,..., C k ), A j = Diag(A j1,..., A jk ) Then, in p the coefficients of X outside the blocks do not enter into play and so can be set to zero: in other words X can be assumed to have the same block structure: Then p = sup { d = inf { X = Diag(X 1,..., X k ) and X 0 iff X i 0 (i [k]). k i=1 m j=1 C i, X i : b j y j : j=1 k A ji, X i = b j (j [m]), X i 0 (i [k]) } i=1 m m } y j A j1 C 1,..., y j A jk C k j=1
Linear programming E = R n, (a, b) = a T b K = R n 0 = K is the non negative orthant. F = R m, T is defined by a matrix A = (a ij ) R m n : c R n, b R m. T : R n R m T : R m R n x Ax y A T y p = sup{c T x : Ax = b, x 0} d = inf{b T y : A T y c 0}
Linear programming p = sup { d = inf { n i=1 m c i x i : b j y j : n a ji x i = b j (j [m]), x i 0 (i [n]) } i=1 j=1 j=1 m a ji y j c i (i [n]) } If the program is bounded, the feasible region is a polytope. So it has only finitely many extreme points, its vertices, and the optimal value is attained at one of them. Moreover there is no duality gap: p = d. A linear program is a special case of a semidefinite program, corresponding to the case of diagonal matrices C and A i.
Strong duality Recall p = sup{(c, x) : x K, Tx = b} d = inf{(b, y) : T y c K } Theorem: (Slater condition for strong duality) If the dual program is bounded from below, and is strictly feasible, then the primal program has an optimal solution and p = d. If the primal program is bounded from above, and is strictly feasible, then the dual program has an optimal solution and p = d. d strictly feasible : y : T y c int(k ). p strictly feasible : x int(k ) : Tx = b.
Strong duality: sketch of proof p = sup{(c, x) : x K, Tx = b} d = inf{(y, b) : T y c K } Let M := {T y c : (y, b) d } M is convex and non empty (b 0, affine image of halfspace). Claim: M int(k ) =. Claim: (1) x : (x, z) (y, x) for all z M, y K. Two convex bodies whose interiors do not intersect are separated by an hyperplane. M K {t : (x, t) = a}
Strong duality: sketch of proof M := {T y c : (y, b) d } (1) x : (x, z) (y, x) for all z M, y K Claim: x K. If x / (K ), there is y K s.t. (y, x) < 0. If λ +, (λy, x), which contradicts (1).
Strong duality: sketch of proof M := {T y c : (y, b) d } (1) x : (x, z) (y, x) for all z M, y K Claim: µ > 0 : Tx = µb and (c, x) µd. 1. (1) with y = 0 shows (x, z) 0 for all z M. 2. (y, b) d T y c M (x, T y c) 0 (y, Tx) (x, c). It shows inclusion between the half spaces defined by d and Tx. So µ 0 : Tx = µb. 3. Strict feasibility of dual program rules out µ = 0. Indeed, there is y s.t. T y c int(k ), so (x, T y c) > 0, so (y, Tx) > (x, c). If Tx = 0 we would have 0 > (x, c). But we have seen (y, b) d (y, Tx) = 0 (x, c). 4. From 2., d µ 1 (c, x).
Strong dualiy: sketch of proof x K, µ 0 : Tx = µb and (c, x) µd. Claim: x := x/µ is primal optimal and (c, x ) = p = d. Indeed, we have seen that x K, that Tx = b, so x is primal feasible. We also have (x, c) d while weak duality shows (x, c) p d so equality holds and (x, c) = p = d.