. Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu
Linear Optimization Problem, its Data and Structure Linear Optimization problem: min x { c T x + d : Ax b } (LO) x R n : vector of decision variables, c R n and d R form the objective, A: an m n constraint matrix, b R m : right hand side. Problem s structure: its sizes m, n. Problem s data: (c, d, A, b). Data Uncertainty The data of typical real world LOs are partially uncertain not known exactly when the problem is being solved. Sources of data uncertainty: Prediction errors. Some of data entries (future demands, returns, etc.) do not exist when the problem is solved and hence are replaced with their forecasts.
Measurement errors: Some of the data (parameters of technological devices and processes, contents associated with raw materials, etc.) cannot be measured exactly, and their true values drift around the measured nominal values. Implementation errors: Some of the decision variables (planned intensities of technological processes, parameters of physical devices we are designing, etc.) cannot be implemented exactly as computed. The implementation errors are equivalent to artificial data uncertainties. Indeed, the impact of implementation errors x j (1 + ϵ j )x j + δ j on the validity of the constraint a i1 x 1 +... + a in x n b i is as if there were no implementation errors, but the data of the constraint was subject to perturbations a ij (1 + ϵ j )a ij, b i b i j a ij δ j.
Data Uncertainty: Traditional Treatment and Dangers Traditionally, small (fractions of percents) data uncertainty is just ignored, the problem is solved as it is with the nominal data, and the resulting nominal optimal solution is forwarded to the end user; large data uncertainty is assigned with a probability distribution and is treated via Stochastic Programming techniques. Fact: in many cases, even small data uncertainty can make the nominal solution heavily infeasible and thus practically meaningless.
Example: Antenna Design [Physics:] Directional density of energy transmitted by an monochromatic antenna placed at the origin is proportional to D(δ) 2, where the antenna s diagram D(δ) is a complexvalued function of 3-D direction (unit 3-D vector) δ. [Physics:] For an antenna array a complex antenna comprised of a number of antenna elements, the diagram is D(δ) = j x jd j (δ) ( ) D j ( ): diagrams of elements x j : complex weights design parameters responsible for how the elements in the array are invoked. Antenna Design problem: Given diagrams D 1 ( ),..., D k ( ) and a target diagram D ( ), find the weights x i C such that the synthesized diagram ( ) is as close as possible to the target diagram D ( ). When D j ( ), D ( ), same as the weights, are real and the closeness is quantified by the uniform norm on a finite grid Γ of directions, Antenna Design becomes the LO problem { min τ : τ D (δ) } x jd j (δ) τ δ Γ. x R n,τ j
Example: Consider planar antenna array comprised of 10 elements (circle surrounded by 9 rings of equal areas) in the plane XY (Earth s surface ), and our goal is to send most of the energy up, along the 12 o cone around the Z-axis: Diagram of a ring {z = 0, a x 2 + y 2 b}: [ D a,b (θ) = 1 b 2π 2 r cos ( 2πrλ 1 cos(θ) cos(ϕ) ) ] dϕ dr, a 0 θ: altitude angle λ: wavelength 0.25 0.2 0.15 0.1 0.05 0 0.05 10 antenna elements, equal areas, outer radius 1 m Nominal design problem: τ = min x R 10,τ 0.1 0 10 20 30 40 50 60 70 80 90 Diagrams of the elements vs the altitude angle θ, λ =50 cm { τ : τ D (θ i ) 10 j=1 x jd j (θ i ) τ, 1 i 240, θ i = iπ 480 1.2 } 1 0.8 0.6 0.4 0.2 0 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Target (blue) and nominal optimal (magenta) diagrams, τ = 0.0589
But: The design variables are characteristics of physical devices and as such they cannot be implemented exactly as computed. What happens when there are implementation errors: x fact j with small ρ? = (1 + ξ j )x comp j, ξ j Uniform[ ρ, ρ] 1.2 20 200 1500 1 15 150 1000 0.8 10 100 500 5 50 0.6 0 0 0 0.4 500 5 50 0.2 10 100 1000 0 15 150 1500 0.2 0 10 20 30 40 50 60 70 80 90 20 0 10 20 30 40 50 60 70 80 90 200 0 10 20 30 40 50 60 70 80 90 2000 0 10 20 30 40 50 60 70 80 90 ρ = 0 ρ = 0.0001 ρ = 0.001 ρ = 0.01 Dream and reality, nominal optimal design: samples of 100 actual diagrams (red) for different uncertainty levels. Blue: the target diagram -distance to target energy concentration Dream Reality ρ = 0 ρ = 0.0001 ρ = 0.001 ρ = 0.01 value min mean max min mean max min mean max 0.059 1.280 5.671 14.04 11.42 56.84 176.6 39.25 506.5 1484 85.1% 0.5% 16.4% 51.0% 0.1% 16.5% 48.3% 0.5% 14.9% 47.1% Quality of nominal antenna design: dream and reality. Data over 100 samples of actuation errors per each uncertainty level ρ. Conclusion: Nominal optimal design is completely meaningless...
Example: NETLIB Case Study. NETLIB: a collection of LO problems for testing LO algorithms. Constraint # 372 of the NETLIB problem PILOT4: a T x 15.79081x 826 8.598819x 827 1.88789x 828 1.362417x 829 1.526049x 830 0.031883x 849 28.725555x 850 10.792065x 851 0.19004x 852 2.757176x 853 12.290832x 854 + 717.562256x 855 0.057865x 856 3.785417x 857 78.30661x 858 122.163055x 859 6.46609x 860 0.48371x 861 0.615264x 862 1.353783x 863 84.644257x 864 122.459045x 865 43.15593x 866 1.712592x 870 0.401597x 871 +x 880 0.946049x 898 0.946049x 916 b 23.387405 The related nonzero coordinates in the optimal solution x of the problem as reported by CPLEX are: x 826 = 255.6112787181108 x 827 = 6240.488912232100 x 828 = 3624.613324098961 x 829 = 18.20205065283259 x 849 = 174397.0389573037 x 870 = 14250.00176680900 x 871 = 25910.00731692178 x 880 = 104958.3199274139 This solution makes (C) an equality within machine precision. Note: The coefficients in a, except for the coefficient 1 at x 880, are ugly reals like -15.79081 or - 84.644257. Ugly coefficients characterize certain technological devices and processes; as such they could hardly be known to high accuracy and coincide with the true data within accuracy of 3-4 digits, not more. Question: Assuming that the ugly entries in a are 0.1%- accurate approximations of the true data ã, what is the effect of this uncertainty on the validity of the true constraint ã T x b as evaluated at x? (C)
Answer: The minimum, over all 0.1% perturbations a ã of ugly entries in a, value of ã T x b, is < 104.9, that is, with 0.1% perturbations of ugly coefficients, the violation of the constraint as evaluated at the nominal solution can be as large as 450% of the right hand side! With independent random 0.1%-perturbations of ugly coefficients, the violation of the constraint at average is as large as 125% of the right hand side; the probability of violating the constraint by at least 150% of the right hand side is as large as 0.18. Among 90 NETLIB problems, perturbing ugly coefficients by just 0.01% results in violating some of the constraints, as evaluated at nominal optimal solutions, by more than 50% in 13 problems, by more than 100% in 6 problems. by 210,000% in PILOT4.
Conclusion: In applications of LO, there exists a real need of a technique capable of detecting cases when data uncertainty can heavily affect the quality of the nominal solution, and in these cases to generate a reliable solution, one that is immunized against uncertainty. Robust Optimization is aimed at satisfying the above need. Uncertain Linear Optimization Problems x Definition: An uncertain LO problem is a collection { { min c T x + d : Ax b }} (LO U ) x (c,d,a,b) U { of LO problems (instances) min c T x + d : Ax b } of common structure (i.e., with common numbers m of constraints and n of variables) with the data varying in a given uncertainty set U R (m+1) (n+1). Usually we assume that the uncertainty set is parameterized, in an affine fashion, by perturbation vector ζ varying in a given perturbation set Z: { [ ] [ ] [ c T d c T U = = 0 d 0 + L c T ζ l A b A 0 b l 0 l=1 A l }{{} nominal data D 0 d l b l } {{ } basic shifts D l ] : ζ Z R }. L
Example: When speaking about PILOT4, we tacitly used the following model of uncertainty: Uncertainty affects only the ugly coefficients {a ij : (i, j) J } in the constraint matrix, and every one of them is allowed to run, independently of all other coefficients, through the interval [a n ij ρ ij a n ij, a n ij + ρ ij a n ij ] a n ij : nominal values of the data ρ ij : perturbation levels (which in the experiment were set to ρ = 0.001). Perturbation set: The box {ζ = {ζ ij } (i,j) J : ρ ij ζ ij ρ ij } Parameterization of the data by perturbation vector: [ ] [ ] c T d [c = n ] T d n A b A n b n + [ ] ζ ij e i e T j (i,j) J
{ { min c T x + d : Ax b }} x (c,d,a,b) U { [ ] [ c T U = 0 d 0 + L c T ζ l A 0 b l 0 l=1 A l }{{} nominal data D 0 d l b l ] } {{ } basic shifts D l : ζ Z R L }. (LO U ) There is no universally defined notion of a solution to a family of optimization problems, like (LO U ). Consider decision environment as follows: A.1. All decision variables in (LO U ) represent here and now decisions; they should be assigned specific numerical values as a result of solving the problem before the actual data reveals itself. A.2. The decision maker is fully responsible for consequences of the decisions to be made when, and only when, the actual data is within the prespecified uncertainty set U. A.3. The constraints in (LO U ) are hard we cannot tolerate violations of constraints, even small ones, when the data is in U.
{ { min c T x + d : Ax b }} (LO U ) x (c,d,a,b) U In the above decision environment, the only meaningful candidate solutions to (LO U ) are the robust feasible ones. Definition: x R n is called a robust feasible solution to (LO U ), if x is feasible for all instances: Ax b (c, d, A, b) U. Indeed, by A.1 a meaningful candidate solution should be independent of the data, i.e., it should be just a fixed vector x. By A.2-3, it should satisfy the constraints, whatever be a realization of the data from U. Acting in the same worst-case-oriented fashion, it makes sense to quantify the quality of a candidate solution x by the guaranteed (the worst, over the data from U) value of the objective: sup{c T x + d : (c, d, A, b) U}
{ { min c T x + d : Ax b }} (LO U ) x (c,d,a,b) U Now we can associate with (LO U ) the problem of finding the best, in terms of the guaranteed value of the objective, among the robust feasible solutions: min t,x { t : c T x + d t, Ax b (c, d, A, b) U } (RC) This is called the Robust Counterpart of (LO U ). Note: Passing from LOs of the form { min c T x + d : Ax b } x to their equivalents { t : c T x + d t, Ax b } min t,x we always may assume that the objective is certain, and the RC respects this equivalence. We lose nothing by assuming the objective in (LO U ) certain, in which case we can think of U as of the set in the space R m (n+1) of the [A, b]-data, and the RC reads min x { c T x : Ax b [A, b] U }. (RC)
{ { min c T x : Ax b }} (LO U ) x (A,b) U { min x c T x : Ax b [A, b] U } (RC) Fact I: The RC of uncertain LO with certain objective is a purely constraint-wise construction: when building the RC, we replace every constraint a T i x b i of the instances with its RC a T i x b i [a T i, b i ] U i (RC i ) where U i is the projection of the uncertainty set U on the space of data [a T i, b i] of i-th constraint. Fact II: The RC remains intact when extending the uncertainty set U to its closed convex hull. When (LO U ) has certain objective, the RC remains intact when extending U to the direct product of closed convex hulls of U i. Thus, the transformation U U + = [cl Conv(U 1 )]... [cl Conv(U m )] keeps the RC intact. From now on, we always assume uncertainty set U convex, and perturbation set Z convex and closed.
{ { min c T x : Ax b }} (LO U ) x [A,b] U { min x c T x : Ax b [A, b] U } (RC) The central questions associated with the concept of RC are: A. What is the computational status of the RC? When is it possible to process the RC efficiently? to be addressed in-depth below. B. How to come-up with meaningful uncertainty sets? modeling issue NOT to be addressed in this talk.
min x { c T x : Ax b [A, b] U } (RC) Potentially bad news: The RC is a semi-infinite optimization problem (finitely many variables, infinitely many constraints) and as such can be computationally intractable. Example: Consider the nearly linear semi-infinite constraint P x p 1 1, [P, p] U U = {[P, p] : p = Bζ, ζ 2 1} To check whether x = 0 is robust feasible is, in general, NP-hard!
min x { c T x : Ax b [A, b] U } (RC) Good news: The RC of an uncertain LO problem is computationally tractable, provided the uncertainty set U is so. Explanation, I: The RC can be written down as the optimization problem { min c T x : f i (x) 0, i = 1,..., m } x f i (x) = sup [a T i x b i] [A,b] U The functions f i (x) are convex (due to their origin) and efficiently computable (as maxima of affine functions over computationally tractable convex sets). Thus, the RC is a Convex Programming program with efficiently computable objective and constraints, and problems of this type are efficiently solvable.
Recaling that the RC is a constraint-wise construction, all we need is to reformulate in a tractable form a single semi-infinite constraint α = [a; b] {α n + Aζ : ζ Z} R n+1 : α T [x; 1] a T x + b 0. Consider several instructive cases when tractable reformulation of ( ) is easy does not require any theory. 1. Scenario uncertainty Z = Conv{ζ 1,..., ζ N }. Setting α j = α n + Aζ j, 1 j N, we get ( ) {α j [x; 1] 0, 1 j N} 2. p -uncertainty Z = {ζ R L : ζ p 1}, ( L ) 1/p ζ p = l=1 ζ l p, 1 p < max l ζ l, p = We have ( ) [α n ] T [x; 1] + A T [x; 1] p 0, 1 p + 1 p = 1 In particular, Interval uncertainty { α : (α n ) j δ j α j α n j + δ j, 1 j n + 1 } ( ) leads to ( ) [α n ] T [x; 1] + n δ i x i + δ n+1 0 i=1
General Well-Structured Case Definition: Let us say that a set X R N is well-structured, if it admits a well-structured representation a representation of the form A 0 x + B 0 u + c 0 = 0 X = x R N : u R M A : 1 x + B 1 u + c 1 K 1, A K x + B k u + c K K K where K k, for every k K, is a simple cone, specifically, either nonnegative orthant R m + = {x R m : x 0}, m = m k, or a Lorentz cone L m = {x R m : x m x 2 1 +... + }, m = m x2 m 1 k, or a Semidefinite cone S m + the cone of positive semidefinite matrices in the space S m of real symmetric m m matrices, m = m k. Example 1: The set X = {x R N : x 1 1} admits polyhedral representation X = {x R N : u R N : u i x i u i, i u i 1} = x R n : u R N : A 1 x + B 1 u + c 1 u 1 x 1 u 1 + x 1. u N x N u N + x N 1 i u i R 2N+1 +
Example 2: The set X = {x R 4 + : x 1 x 2 x 3 x 4 1} admits conic quadratic representation 0 u 1 x 1 x 2 X = x R4 + : u R 3 : 0 u 2 x 3 x 4 = x R n : u R 3 : 1 u 3 u 1 u 2 [x 1 ; x 2 ; x 3 ; x 4 ; u 1 ; u 2 ; u 3 1] R 7 + [2u 1 ; x 1 x 2 ; x 1 + x 2 ] L 3 [2u 2 ; x 3 x 4 ; x 3 + x 4 ] L 3 [2u 3 ; u 1 u 2 ; u 1 + u 2 ] L 3 Example 3: The set X of m n matrices X with nuclear norm (sum of singular values) 1 admits semidefinite representation { X = X R m n : u = (U S m, V S n ) : Tr(U) [ + Tr(V ] ) 2 U X X T 0 V }.
X = x R N : u R M : A 0 x + B 0 u + c 0 = 0 A 1 x + B 1 u + c 1 K 1 A K x + B k u + c K K K, ( ) Good news on well-structured representations: Computational tractability: Minimizing a linear objective over a set given by ( ) reduces to solving a well-structured conic program min x,u c T x : A 0 x + B 0 u + c 0 = 0 A 1 x + B 1 u + c 1 K 1 A K x + B k u + c K K K and thus can be done in a theoretically (and to some extent also practically) efficient manner by polynomial time interior point algorithms. Extremely powerful expressive abilities: w.-s.r. s admit a simple fully algorithmic calculus which makes it easy to build a w.-s.r. for the result of a convexity-preserving operation with convex sets (like taking intersections, direct sums, affine images, inverse affine images, polars, etc.) via w.-s.r. s of the operands. As a result, for all practical purposes, all computationally tractable convex sets arising in Optimization admit explicit w.-s.r. s.,
Example: The set given by the constraints { Ax = b (a) x 0 [ 8 ] 1/3 x i 3 x 1/7 2 x 2/7 3 x 3/7 4 + 2x 1/5 1 x 2/5 5 x 1/5 6 (b) (c) i=1 5I 5x 2 1 x 1/2 1 x 2 2 x 2 x 1 x 1 x 4 x 3 x 3 x 6 x 3 x 3 x 8 + 2 x 1/3 2 x 3 3 x5/8 4 is positive semidefinite x 1 x 2 x 1 x 3 x 2 x 4 x 3 x 2 x 1 x 2 x 3 x 2 x 4 x 3 x 3 x 2 x 3 x 2 x 3 x 4 x 3 x 4 x 3 x 4 x 3 x 4 x 3 x 4 is positive semidefinite x 1 + x 2 sin(ϕ) + x 3 sin(2ϕ) + x 4 sin(4ϕ) 0 ϕ [ ] 0, π 2 admits an explicit w.-s.r. This w.-s.r. can be built fully automatically, by a compiler, and this is how CVX works.
Coming back to Robust Counterpart of a scalar linear inequality, we want to answer the following Question: How to describe in a tractable form the fact that a vector x satisfies the semi-infinite constraint α = [a; b] {α n + Aζ : ζ Z} R n+1 : α T [x; 1] a T x + b 0. We are about to give a nice answer in the case when the perturbation set Z is well structured. Substituting α = α n + Aζ into α T [x; 1] 0, we arrive at the inequality ζ T A T [x; 1] }{{} p(x) or, which is the same, + [α n ] T [x; 1] }{{} q(x) 0, ( ) (p(x)) T ζ q(x) (!) Note that p(x) and q(x) are affine in x and that x satisfies ( ) if and only if (!) holds true for every ζ Z, that is, if and only if the relation (!), considered as a scalar linear inequality in variables ζ, is a consequence of the constraints defining Z. Thus, our goal is to understand when a scalar linear inequality is a consequence of the system of constraints defining a well-structured set. The answer is given by Conic Duality.
Excursion to Conic Duality Situation: We are given a system of linear equations and conic constraints A 0 w b 0 = 0 & A i w b i K i, 1 i I (C) w: variable from Euclidean space E with inner product, K i : closed convex pointed cones with nonempty interiors in Eculidean spaces E i with inner products, i w A i w b i : E E i : affine mappings Case of primary interest: Every K i is or nonnegative orthant R m i + = {y R m i : y 0}, (E i = R m i, y, z i = y T z), or Lorentz cone L m i + = {y R m i : y mi [y 1 ;...; y mi 1 2 } (E i = R m i, y, z i = y T z), or semidefinite cone S m i + (E i = S m i is the space of m i m i symmetric matrices with the Frobenius inner product y, z i = Tr(yz) = y ij z ij ) i,j Goal: To understand when a scalar linear inequality p, w q is a consequence of (C), meaning that ( ) is satisfied at every solution to (C). ( )
Given a system A 0 w b 0 = 0 & A i w b i K i, 1 i I (C) w: variable from Euclidean space (E,, ) K i : closed convex pointed cones with nonempty interiors in Euclidean spaces (E i,, i ) of linear equations and conic constraints, we want to understand when a scalar linear inequality p, w q is a consequence of the system. Linear aggregation: Choose Lagrange multipliers y i, 0 i I, such that for i 1, y i belongs to the cone dual to K i : i > 0 : y i K i := {y : y, u i 0 u K i } Since y i K i, the conic constraint A iw b i K i implies that y i, [A i w b i ] i 0, that is, [ (C) A i y i, w b i, y i i ] (+) A : F E is the conjugate of A : E F : A y, x E y, Ax F Equations A 0 w b 0 = 0 imply (+) when i = 0. Summing up inequalities (+) over i = 0, 1,..., I, we arrive at the following conclusion: Whenever y i, 0 i I are such that y i K i when i 1, and I I A i y i = p and b i, y i i q i=0 the scalar linear inequality ( ) is a consequence of (C). i=0 ( )
A 0 w b 0 = 0 & A i w b i K i, 1 i I (C) w: variable from Euclidean space (E,, ) K i : closed convex pointed cones with nonempty interiors in Euclidean spaces (E i,, i ) p, w q Conic Farkas Lemma: Let (C) be strictly feasible: w : A 0 w b 0 = 0 & A i w b i intk i, 1 i I Then ( ) is a consequence of (C) if and only if ( ) can be obtained from (C) by linear aggregation, i.e. if and only if y 0, {y i K i } I i=1 : I A i y i = p and i=0 I b i, y i i q. When all K i are nonnegative orthants, the statement remains true when replacing strict feasibility with feasibility. i=0 ( )
Corollary [Conic Duality]: Consider a conic problem Opt(P ) = min { c, w : A 0 w = b 0 & A i w b i K i, 1 i I} x (P ) along with its dual problem { I } I Opt(D) = max b i, y i i : y i K i i 1, A i y i = c y 1,...,y I i=0 i=0 (D) Weak Duality: One always has Opt(P ) Opt(D). Strong Duality: Assuming that (P ) is strictly feasible and below bounded, (D) is solvable, and Opt(P ) = Opt(D). When all K i are nonnegative orthants, Strong Duality holds when strict feasibility of (P ) is replaced with feasibility. Proof of Strong Duality: Apply Conic Farkas Lemma to the inequality c, x Opt(P ) which clearly is the consequence of the system of constraints in (P ). Note: Nonnegative orthants/lorentz/semidefinite cones are self-dual. When all K i are of nonnegative orthants, or Lorentz cones, or semidefinite cones, the constraints y i K i in (D) take the form y i K i.
Conic Farkas Lemma is really deep already in the case when all conic inequalities in question are scalar: K i = R + for all i. In this case it becomes the usual Farkas Lemma: A linear inequality p T x q is a consequence of solvable system of linear inequalities Ax b if and only if there exists y 0 such that A T y = p & b T y q, or, which is the same, if and only if the target inequality ( ) can be obtained by linear aggregation (taking weighted sum with nonnegative weights) of inequalities from the system and the identically true inequality 0 T x 1. While simply-looking, this fact is really deep. Indeed, consider the following derivation { } 1 x 1 x 1 y 1 2 1, y 2 1 x 2 + y 2 2 x + y = 1 x + 1 y 1 2 + 1 2 x 2 + y 2 2 2 = 2 [we have used Cauchy Inequality] In this chain of implications, the starting point is a solvable system of linear inequalities, and the conclusion x + y 2 is a linear inequality as well; however, the steps use highly nonlinear arguments. According to Farkas Lemma, every chain of this type, whatever long and with whatever complicated steps, can be replaced with linear aggregation. A statement with that huge predictive power indeed must be deep! ( )
Back to Robust Linear Optimization Now we are ready to answer the question When the scalar linear inequality in variables ζ Z = (p(x)) T ζ q(x) (!) with p(x), q(x) affine in x holds true for all ζ Z, where Z is well structured: ζ R L : u R M : A 0 ζ + B 0 u + c 0 = 0 A 1 ζ + B 1 u + c 1 K 1 A K ζ + B k u + c K K K ( ) K i : Nonnegative orthants/lorentz/semidefinite cones., Assume that either all K k are nonnegative orthants, or ( ) is strictly feasible. We ask when the scalar linear inequality (p(x)) T ζ + 0 T u q(x) in variables w = [ζ; u] holds true for all [ζ; u] satisfying the constraints in ( ). By conic Farkas Lemma, this is so if and only if there exist y 0, y 1,..., y K such that A T 0 y 0 + K k=1 AT k y k = p(x), B0 T y 0 + K k=1 BT k y k = 0, y0 T c 0 K k=1 yt k c k q(x) y k K k = K k, 1 k K ( ) Since p(x) and q(x) are affine in x, ( ) is a system of linear equations and conic constraints in variables x, y 0, y 1,..., y K We conclude that The set X of those x for which (!) holds true for all ζ Z is well-structured: X = {x : y 0, y 1,..., y k : (x, y 0,..., y k ) satisfy ( )}