4TE3/6TE3. Algorithms for. Continuous Optimization

Similar documents
Nonlinear Programming

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Nonlinear Optimization

Chap 2. Optimality conditions

Convex Optimization & Lagrange Duality

5. Duality. Lagrangian

Lecture 18: Optimization Programming

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization M2

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture: Duality.

Lecture: Duality of LP, SOCP and SDP

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

EE/AA 578, Univ of Washington, Fall Duality

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Optimization for Machine Learning

Optimality, Duality, Complementarity for Constrained Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

1 Review of last lecture and introduction

The Karush-Kuhn-Tucker (KKT) conditions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Conic Linear Optimization and its Dual. yyye

Optimality Conditions for Constrained Optimization

Lagrangian Duality Theory

Introduction to Nonlinear Stochastic Programming

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Duality Theory of Constrained Optimization

Generalization to inequality constrained problem. Maximize

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Constrained Optimization and Lagrangian Duality

4. Algebra and Duality

Numerical Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

A Brief Review on Convex Optimization

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Lecture 3. Optimization Problems and Iterative Algorithms

Lagrangian Duality and Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Linear and Combinatorial Optimization

Support Vector Machines: Maximum Margin Classifiers

CO 250 Final Exam Guide

Lecture 6: Conic Optimization September 8

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Finite Dimensional Optimization Part III: Convex Optimization 1

Summer School: Semidefinite Optimization

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

The Karush-Kuhn-Tucker conditions

Linear and non-linear programming

Homework Set #6 - Solutions

Convex Optimization and Modeling

Lagrange Relaxation and Duality

10 Numerical methods for constrained problems

Inequality Constraints

Lecture 7: Convex Optimizations

Additional Homework Problems

Part IB Optimisation

CS-E4830 Kernel Methods in Machine Learning

The fundamental theorem of linear programming

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

subject to (x 2)(x 4) u,

ICS-E4030 Kernel Methods in Machine Learning

On the Method of Lagrange Multipliers

Lecture 8. Strong Duality Results. September 22, 2008

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Support Vector Machines

Convex Optimization Lecture 6: KKT Conditions, and applications

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

LECTURE 10 LECTURE OUTLINE

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Miscellaneous Nonlinear Programming Exercises

Support Vector Machines for Regression

Nonlinear Optimization: What s important?

Lecture Note 5: Semidefinite Programming for Stability Analysis

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Lecture Notes on Support Vector Machine

Solving generalized semi-infinite programs by reduction to simpler problems.

Math 5311 Constrained Optimization Notes

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

On the Existence and Convergence of the Central Path for Convex Programming and Some Duality Results

Date: July 5, Contents

Optimisation in Higher Dimensions

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

A DUALITY THEOREM FOR NON-LINEAR PROGRAMMING* PHILIP WOLFE. The RAND Corporation

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

Nonlinear Programming and the Kuhn-Tucker Conditions

EE364a Review Session 5

Global Optimality Conditions in Maximizing a Convex Quadratic Function under Convex Quadratic Constraints

Transcription:

4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780

Optimality conditions for constrained convex optimization problems (CO) min f(x) s.t. g j (x) 0, x C. j = 1,, m F = {x C g j (x) 0, j J}. Let C 0 denote the relative interior of the convex set C. Definition 1 A vector (point) x 0 C 0 is called a Slater point of (CO) if g j (x 0 ) < 0, g j (x 0 ) 0, for all j where g j is nonlinear, for all j where g j is linear. (CO) is Slater regular or (CO) satisfies the Slater condition (in other words, (CO) satisfies the Slater constraint qualification). 1

Ideal Slater Point Some constraint functions g j (x) might take the value zero for all feasible points. Such constraints are called singular while the others are called regular. J s = {j J g j (x) = 0 for all x F}, J r = J J s = {j J g j (x) < 0 for some x F}. Remark: Note, that if (CO) is Slater regular, then all singular functions must be linear. Definition 2 A vector (point) x C 0 is called an Ideal Slater point of the convex optimization problem (CO), if x is a Slater point and g j (x ) < 0 for all j J r, g j (x ) = 0 for all j J s. Lemma 1 If the problem (CO) is Slater regular then there exists an ideal Slater point x F. 2

Convex Farkas Lemma Theorem 1 Let U IR n be a convex set and a point w IR n with w / U be given. Then there is a separating hyperplane a T w α, with a IR n, α IR such that a T u α for all u U but U is not a subset of the hyperplane {x a T x = α}. Note that the last property says that there is a u U such that a T u > α. Lemma 2 (Farkas) The convex optimization problem (CO) is given and we assume that the Slater regularity condition is satisfied. The inequality system f(x) < 0 g j (x) 0, x C. j = 1,, m (1) has no solution if and only if there exists a vector y = (y 1,, y m ) 0 such that f(x) + y j g j (x) 0 for all x C. (2) The systems (1) and (2) are called alternative systems, i.e. exactly one of them has a solution. 3

Proof of the Convex Farkas Lemma If (1) is solvable, then (2) cannot hold. To prove the other side: let us assume that (1) has no solution. With u = (u 0,, u m ), we define the set U IR m+1 U = {u x C with u 0 > f(x), u j g j (x) if j J r, u j = g j (x) if j J s }. U is convex (note that due to the Slater condition singular functions are linear) due to the infeasibility of (1) it does not contain the origin. Due to the separation Theorem 1 there exists a separating hyperplane defined by (y 0, y 1,, y m ) and α = 0 such that y j u j 0 for all u U (3) j=0 and for some u U one has j=0 y j u j > 0. (4) The rest of the proof is divided into four parts. I. First we prove that y 0 0 and y j 0 for all j J r. II. Secondly we establish that (3) holds for u = (f(x), g 1 (x),, g m (x)) if x C. III. Then we prove that y 0 must be positive. IV. Finally, it is shown by using induction that y j > 0 for all j J s can be reached. 4

Proof: Steps I and II I. First we show that y 0 0 and y j 0 for all j J r. Let us assume that y 0 < 0. Let us take an arbitrary (u 0, u 1,, u m ) U. By definition (u 0 + λ, u 1,, u m ) U for all λ 0. Hence by (3) one has λy 0 + j=0 y j u j 0 for all λ 0. For sufficiently large λ the left hand side is negative, which is a contradiction, i.e. y 0 must be nonnegative. The proof of the nonnegativity of all y j as j J r goes analogously. II. Secondly we establish that y 0 f(x) + y j g j (x) 0 for all x C. (5) This follows from the observation that for all x C and for all λ > 0 one has u = (f(x) + λ, g 1 (x),, g m (x)) U, thus y 0 (f(x) + λ) + y j g j (x) 0 for all x C. Taking the limit as λ 0 the claim follows. 5

Proof: Step III III. Thirdly we show that y 0 > 0. The proof is by contradiction. We already know that y 0 0. Let us assume to the contrary that y 0 = 0. Hence from (5) we have y j g j (x) + y j g j (x) = y j g j (x) 0 for all x C. j J r j J s Taking an ideal Slater point x C 0 one has whence g j (x ) = 0 if j J s, j J r y j g j (x ) 0. Since y j 0 and g j (x ) < 0 for all j J r, this implies y j = 0 for all j J r. This results in j J s y j g j (x) 0 for all x C. (6) Now, from (4), with x C such that u j = g j ( x) if i J s we have j J s y j g j ( x) > 0. (7) Because the ideal Slater point x is in the relative interior of C there exist a vector x C and 0 < λ < 1 such that x = λ x + (1 λ) x. Using that g j (x ) = 0 for j J s and that the singular functions are linear one gets 6

Proof: Step III cntd. 0 = j J s y j g j (x ) = j J s y j g j (λ x + (1 λ) x) = λ j J s y j g j ( x) + (1 λ) j J s y j g j ( x) > (1 λ) j J s y j g j ( x). Here the last inequality follows from (7). The inequality (1 λ) j J s y j g j ( x) < 0 contradicts (6). Hence we have proved that y 0 > 0. At this point we have (5) with y 0 > 0 and y j 0 for all j J r. Dividing by y 0 > 0 in (5) and by defining y j := y j y 0 for all j J we obtain f(x) + y j g j (x) 0 for all x C. (8) We finally show that y may be taken such that y j > 0 for all j J s. 7

Proof: Step IV IV. To complete the proof we show by induction on the cardinality of J s that one can make y j nonnegative for all j J s. Observe that if J s = then we are done. If J s = 1 then we apply the results proved till this point to the inequality system g s (x) < 0, g j (x) 0, j J r, x C (9) where {s} = J s. The system (9) has no solution, it satisfies the Slater condition, and therefore there exists a ŷ IR m 1 such that g s (x) + j J r ŷ j g j (x) 0 for all x C, (10) where ŷ s > 0 and ŷ j 0 for all j J r. Adding a sufficiently large positive multiple of (10) to (8) one obtains a positive coefficient for g s (x). The general inductive step goes analogously. 8

Proof: Step IV cntd. Assuming that the result is proved if J s = k then the result is proved for the case J s = k+1. Let s J s then J s \ {s} = k, and hence the inductive assumption applies to the system g s (x) < 0 g j (x) 0, j J s \ {s}, g j (x) 0, j J r, x C (11) By construction the system (11) has no solution, it satisfies the Slater condition, and by the inductive assumption we have a ŷ IR m 1 such that g s (x) + ŷ j g j (x) 0 for all x C. (12) j J r J s \{s} where ŷ j > 0 for all j J s and ŷ j 0 for all j J r. Adding a sufficiently large multiple of (12) to (8) one have the desired nonnegative multipliers. Remark: Note, that finally we proved slightly more than was stated. We have proved that the multipliers of all the singular constraints can be made strictly positive.!!!! Srict complementarity in LO!!!! Goldman-Tucker Theorem 9

Karush Kuhn Tucker theory Langrangean, Saddle point The Lagrange function: L(x, y) := f(x) + y j g j (x) (13) where x C and y 0. Note that for fixed y the Lagrangean is convex in x; for fixed x. it is linear in y. Definition 3 A vector pair ( x, ȳ) IR n+m, x C and ȳ 0 is called a saddle point of the Lagrange function L if for all x C and y 0. L( x, y) L( x, ȳ) L(x, ȳ) (14) 10

A saddle point lemma Lemma 3 The vector ( x, ȳ) IR n+m, x C and ȳ 0 is a saddle point of L(x, y) if and only if inf sup L(x, y) = L( x, ȳ) = sup inf L(x, y). (15) x C y 0 y 0 x C Proof: The saddle point inequality (14) easily follows from (15). On the other hand, for any (ˆx, ŷ) one has inf L(x, ŷ) L(ˆx, ŷ) sup L(ˆx, y), x C y 0 hence one can take the supremum of the left hand side and the infimum of the right hand side resulting in sup y 0 inf L(x, y) inf sup L(x, y). (16) x C x C y 0 Using the saddle point inequality (14) one obtains inf sup L(x, y) sup L( x, y) L( x, ȳ) inf L(x, ȳ) supinf L(x, y). x C y 0 y 0 x C y 0 x C (17) Combining (17) and (16) the equality (15) follows. 11

Karush-Kuhn-Tucker Theorem Theorem 2 The problem (CO) is given. Assume that the Slater regularity condition is satisfied. The vector x is an optimal solution of (CO) if and only if there is a vector ȳ such that ( x, ȳ) is a saddle point of the Lagrange function L. Proof: Easy: if ( x, ȳ) is a saddle point of L(x, y) then x is optimal for (CO). The proof of this part does not need any regularity condition. From the saddle point inequality (14) one has f( x) + y j g j ( x) f( x) + ȳ j g j ( x) f(x) + ȳ j g j (x) for all y 0 and for all x C. From the first inequality g j ( x) 0 for all j = 1,, m follows, hence x F is feasible for (CO). Taking the two extreme sides of the above inequality and substituting y = 0 we have f( x) f(x) + for all x F, i.e. x is optimal. ȳ j g j (x) f(x) 12

KKT proof cntd. To prove the other direction we need Slater regularity and the Convex Farkas Lemma 2. Let us take an optimal solution x of the convex optimization problem (CO). Then the inequality system f(x) f( x) < 0 g j (x) 0, x C j = 1,, m is infeasible. Applying the Convex Farkas Lemma 2 one has ȳ 0 such that f(x) f( x) + ȳ j g j (x) 0 for all x C. Using that x is feasible one easily derive the saddle point inequality f( x) + y j g j ( x) f( x) f(x) + which completes the proof. ȳ j g j (x) 13

KKT Corollaries Corollary 1 Under the assumptions of Theorem 2 the vector x C is an optimal solution of (CO) if and only if there exists a ȳ 0 such that (i) (ii) m f( x) = min {f(x) + ȳ j g j (x)} x C m ȳ j g j ( x) = max { y j g j ( x)}. y 0 and Corollary 2 Under the assumptions of Theorem 2 the vector x F is an optimal solution of (CO) if and only if there exists a ȳ 0 such that (i) (ii) m f( x) = min {f(x) + ȳ j g j (x)} x C ȳ j g j ( x) = 0. and Corollary 3 Let us assume that C = IR n and the functions f, g 1,, g m are continuously differentiable functions. Under the assumptions of Theorem 2 the vector x F is an optimal solution of (CO) if and only if there exists a ȳ 0 such that (i) 0 = f( x) + ȳ j g j ( x) and (ii) ȳ j g j ( x) = 0. 14

KKT point Definition 4 Let us assume that C = IR n and the functions f, g 1,, g m are continuously differentiable functions. The vector ( x, ȳ) IR n+m is called a Karush Kuhn Tucker (KKT) point of (CO) if (i) g j ( x) 0, for all j J, (ii) 0 = f( x) + (iii) (iv) ȳ 0. ȳ j g j ( x) = 0, ȳ j g j ( x) Corollary 4 Let us assume that C = IR n and the functions f, g 1,, g m are continuously differentiable convex functions and the assumptions of Theorem 2 hold. Let the vector x F be a KKT point, then x is an optimal solution of (CO). 15

Duality in CO Lagrange dual Definition 5 Denote The problem m ψ(y) = inf {f(x) + y j g j (x)}. x C (LD) sup ψ(y) y 0 is called the Lagrange Dual of problem (CO). Lemma 4 The Lagrange Dual (LD) of (CO) is a convex optimization problem, even if the functions f, g 1,, g m are not convex. Proof: ψ(y) is concave! Let ȳ, ŷ 0 and 0 λ 1. m ψ(λȳ + (1 λ)ŷ) = inf {f(x) + (λȳ j + (1 λ)ŷ j )g j (x)} x C = inf x C {λ[f(x) + m inf x C {λ[f(x) + m = λψ(ȳ) + (1 λ)ψ(ŷ). ȳ j g j (x)] + (1 λ)[f(x) + ŷ j g j (x)]} ȳ j g j (x)]} + inf x C {(1 λ)[f(x) + m ŷ j g j (x)]} 16

Results on the Lagrange dual Theorem 3 (Weak duality) If x is a feasible solution of (CO) and ȳ 0 then ψ(ȳ) f( x) and equality hold if and only if m inf {f(x) + ȳ j g j (x)} = f( x). x C Proof: ψ(ȳ) = inf x C {f(x) + m ȳ j g j (x)} f( x) + ȳ j g j ( x) f( x). Equality holds iff inf x C {f(x)+ m ȳjg j (x)} = f( x) and hence ȳ j g j (x) = 0 for all j J. Corollary 5 If x is a feasible solution of (CO), ȳ 0 and ψ(ȳ) = f( x) then the vector x is an optimal solution of (CO) and ȳ is optimal for (LD). Further if the functions f, g 1,, g m are continuously differentiable then ( x, ȳ) is a KKT-point. Theorem 4 (Strong duality) Assume that (CO) satisfies the Slater regularity condition. Let x be a feasible solution of (CO). The vector x is an optimal solution of (CO) if and only if there exists a ȳ 0 such that ȳ is an optimal solution of (LD) and ψ(ȳ) = f( x). 17

Wolfe dual Definition 6 Assume that C = IR n and the functions f, g 1,, g m are continuously differentiable and convex. The problem (WD) sup{f(x) + f(x) + y j g j (x)} y j g j (x) = 0, y 0 is called the Wolfe Dual of the convex optimization problem (CO). Warning! Remember, we are only allowed to form the Wolfe dual of a nonlinear optimization problem if it is convex! For nonconvex problems one has to work with the Lagrange dual. 18

Examples for dual problems Linear optimization (LO) min{c T x Ax = b, x 0}. g j (x) = (a j ) T x b j if j = 1,, m; g j (x) = ( a j m ) T x + b j m if j = m + 1,,2m; g j (x) = x j 2m if j = 2m + 1,,2m + n. Denote Lagrange multipliers by y, y + and s, then the Wolfe dual (WD) of (LO) is: max c T x + (y ) T (Ax b) + (y + ) T ( Ax + b) + s T ( x) c + A T y A T y + s = 0, y 0, y + 0, s 0. Substitute c = A T y + A T y + + s and let y = y + y then max b T y A T y + s = c, s 0. Note that the KKT conditions provide the well known complementary slackness condition x T s = 0. 19

Quadratic optimization (QO) min{c T x + 1 2 xt Qx Ax b, x 0}. g j (x) = ( a j ) T x + b j if j = 1,, m; g j (x) = x j m if j = m + 1,, m + n. Lagrange multipliers: y and s. The Wolfe dual (WD) of (QO) is: max c T x + 1 2 xt Qx + y T ( Ax + b) + s T ( x) c + Qx A T y s = 0, y 0, s 0. Substitute c = Qx + A T y + s in the objective. max b T y 1 2 xt Qx Qx + A T y + s = c, y 0, s 0. Q = D T D (e.g. Cholesky), let z = Dx. The following (QD) dual problem is obtained: max b T y 1 2 zt z D T z + A T y + s = c, y 0, s 0. 20

Constrained maximum likelihood estimation Finite set of sample points x i, (1 i n). Find the most probable density values that satisfy some linear (e.g. convexity) constraints. Maximize the Likelihood function Π n i=1 x i under the conditions Ax 0, d T x = 1, x 0. The objective can equivalently replaced by n min ln x i. i=1 Lagrange multipliers: y IR m, t IR and s IR n. The Wolfe dual (WD) is: n max ln x i + y T ( Ax) + t(d T x 1) + s T ( x) i=1 X 1 e A T y + td s = 0, y 0, s 0. Multiplying the first constraint by x T one has x T X 1 e x T A T y + tx T d x T s = 0. Using d T x = 1, x T X 1 e = n and the optimality conditions y T Ax = 0, x T s = 0 we have t = n. x is necessarily strictly positive, hence the dual variable s must be zero at optimum. 21

Constrained maximum likelihood estimation: 2 max n i=1 ln x i X 1 e + A T y = nd, y 0. Eliminating the variables x i > 0: x i = 1 nd i a T i y and ln x i = ln(nd i a T i y) i. max n i=1 ln(nd i a T i y) A T y nd, y 0. 22

Example: positive duality gap Duffin s convex optimization problem (CO) min e x 2 s.t. x 2 1 + x2 2 x 1 0 x IR 2. The feasible region is F = {x IR 2 x 1 0, x 2 = 0}. (CO) is not Slater regular. The optimal value of the object function is 1. The Lagrange function is given by L(x, y) = e x 2 + y( x 2 1 + x2 2 x 1). Now, let ǫ = x 2 1 + x2 2 x 1, then x 2 2 2ǫx 1 ǫ 2 = 0. Hence, for any ǫ > 0 we can find x 1 > 0 such that ǫ = x 2 1 + x2 2 x 1 even if x 2 goes to infinity. However, when x 2 goes to infinity e x 2 goes to 0. So, ( ) ψ(y) = inf e x 2 + y x 2 x IR 2 1 + x2 2 x 1 = 0, 23

thus the optimal value of the Lagrange dual (LD) max ψ(y) s.t. y 0 is 0. Nonzero duality gap that equals to 1!

Example: infinite duality gap Duffin s example slightly modified min x 2 s.t. x 2 1 + x2 2 x 1 0. The feasible region is F = {x IR 2 x 1 0, x 2 = 0}. The problem is not Slater regular. The optimal value of the object function is 0. The Lagrange function is given by So, L(x, y) = x 2 + y( ψ(y) = inf x IR 2 { x 2 + y x 2 1 + x2 2 x 1). ( x 21 + x 22 x 1 )} =, thus the optimal value of the Lagrange dual (LD) max ψ(y) s.t. y 0 is, because ψ(y) is minus infinity! 24

Semidefinite optimization Let A 0, A 1,, A n IR m m, let c IR n, x IR n. The primal SDO problem is defined as (PSO) min c T x (18) n s.t. A 0 + A k x k 0, F(x) = A 0 + n k=1 A kx k The dual SDO problem is: k=1 (DSP) max Tr(A 0 Z) (19) s.t. Tr(A k Z) = c k, for all k = 1,, n, Z 0, Theorem 5 (Weak duality) If x IR n is primal feasible and Z IR m m is dual feasible, then c T x Tr(A 0 Z) with equality if and only if F(x)Z = 0. Proof: c T x Tr(A 0 Z) = n k=1 Tr(A k Z)x k Tr(A 0 Z) = Tr(( n k=1 A k x k A 0 )Z) = Tr(F(x)Z) 0. 25

SDO Dual as Lagrange Dual (PSO ) min c T x (20) s.t. F(x) + S = 0 S 0, The Lagrange function L(x, S, Z) is defined on {(x, S, Z) x IR n, S IR m m, S 0, Z IR m m, } L(x, S, Z) = c T x e T (F(x) Z)e + e T (S Z)e, X Z is the Minkowski product, e T (S Z)e = Tr(SZ). n L(x, S, Z) = c T x x k Tr(A k Z) + Tr(A 0 Z) + Tr(SZ). (21) k=1 (DSDL) max {ψ(z) Z IR m m } (22) ψ(z) = min{ L(x, S, Z) x IR n, S IR m m, S 0}. (23) Optimality conditions to get ψ(z): min S Tr(SZ) = { 0 if Z 0, otherwise. (24) 26

SDO dual cntd. If we minimize (23) in x, we need to equate the x gradient of L(x, S, Z) to zero: c k Tr(A k Z) = 0 for all k = 1,, n. (25) Multiplying (25) by x k and summing up c T x n k=1 x k Tr(A k Z) = 0. By combining the last formula and the results presented in (24) and in (25) the simplified form of the Lagrange dual (22), the Lagrange Wolfe dual (DSO) max Tr(A 0 Z) s.t. Tr(A k Z) = c k, for all k = 1,, n, Z 0, follows. It is identical to (19). 27

Duality in cone-linear optimization min c T x Ax b C 1 (26) x C 2. Definition 7 Let C IR n be a convex cone. The dual cone (polar, positive polar) C is defined by C := { z IR n x T z 0 for all x C}. The Dual of a Cone-linear Problem min c T x s Ax + b = 0 s C 1 x C 2. Linear equality constraints s Ax+b = 0 and (s, x) must be in the convex cone C 1 C 2 := {(s, x) s C 1, x C 2 }. The Lagrange function L(s, x, y) is defined on and is given by {(s, x, y) s C 1, x C 2, y IR m } L(s, x, y) = c T x + y T (s Ax + b) = b T y + s T y + x T (c A T y). 28

Thee Lagrange dual of the cone-linear problem where max y IR ψ(y) m ψ(y) = min{ L(s, x, y) s C 1, x C 2 }. The vector s is in the cone C 1, hence { min s T 0 if y C y = 1, s C 1 otherwise. min x T (c A T y) = x C 2 { 0 if c A T y C2, otherwise. By combining these: we have { b ψ(y) = T y if y C1 and c AT y C2, otherwise. Thus the dual of (26) is: max b T y c A T y C2 y C1. 29