MIT Sloan School of Management

Similar documents
Convex Optimization M2

Solving conic optimization problems via self-dual embedding and facial reduction: a unified approach

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

1 Review of last lecture and introduction

Introduction to Real Analysis Alternative Chapter 1

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

MIT LIBRARIES. III III 111 l ll llljl II Mil IHII l l

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

15. Conic optimization

5. Duality. Lagrangian

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Optimization and Optimal Control in Banach Spaces

Optimality Conditions for Constrained Optimization

Duality Theory of Constrained Optimization

POLARS AND DUAL CONES

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Largest dual ellipsoids inscribed in dual cones

4. Algebra and Duality

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Summer School: Semidefinite Optimization

Chapter 2 Convex Analysis

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Convex Optimization Boyd & Vandenberghe. 5. Duality

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Lecture 8. Strong Duality Results. September 22, 2008

A strongly polynomial algorithm for linear systems having a binary solution

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Additional Homework Problems

Lagrangian Duality Theory

A NICE PROOF OF FARKAS LEMMA

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Lecture: Duality.

On duality theory of conic linear problems

Lecture 7 Monotonicity. September 21, 2008

Lecture 5. The Dual Cone and Dual Problem

On Two Measures of Problem Instance Complexity and their Correlation with the Performance of SeDuMi on Second-Order Cone Problems

Conic optimization: an elegant framework for convex optimization

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

Primal/Dual Decomposition Methods

Strong Dual for Conic Mixed-Integer Programs

Conic Linear Optimization and its Dual. yyye

Definitions and Properties of R N

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Convex Optimization & Lagrange Duality

Chapter 1. Preliminaries

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Some Properties of Convex Hulls of Integer Points Contained in General Convex Sets

CHAPTER 7. Connectedness

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

Lecture: Duality of LP, SOCP and SDP

Interior-Point Methods for Linear Optimization

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

On smoothness properties of optimal value functions at the boundary of their domain under complete convexity

Duality in Linear Programming

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Course 212: Academic Year Section 1: Metric Spaces

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization and Modeling

Functional Analysis HW #1

On the Second-Order Feasibility Cone: Primal-Dual Representation and Efficient Projection

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

CO 250 Final Exam Guide

Linear Programming Redux

Radial Subgradient Descent

The fundamental theorem of linear programming

Optimality Conditions for Nonsmooth Convex Optimization

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

Elements of Convex Optimization Theory

Extended Monotropic Programming and Duality 1

Closedness of Integer Hulls of Simple Conic Sets

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

On duality gap in linear conic problems

arxiv: v1 [math.oc] 21 Jan 2019

Strong Duality and Minimal Representations for Cone Optimization

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Mixed-integer nonlinear programming, Conic programming, Duality, Cutting

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subdifferentiability and the Duality Gap

The distance between two convex sets

Franco Giannessi, Giandomenico Mastroeni. Institute of Mathematics University of Verona, Verona, Italy

Convex Sets. Prof. Dan A. Simovici UMB

Constrained optimization

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

SHORT COMMUNICATION. Communicated by Igor Konnov

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Maximal Monotone Inclusions and Fitzpatrick Functions

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

4. Convex optimization problems

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

1 Directional Derivatives and Differentiability

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

IE 521 Convex Optimization Homework #1 Solution

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Transcription:

MIT Sloan School of Management Working Paper 4286-03 February 2003 On an Extension of Condition Number Theory to Non-Conic Convex Optimization Robert M Freund and Fernando Ordóñez 2003 by Robert M Freund and Fernando Ordóñez All rights reserved Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission, provided that full credit including notice is given to the source This paper also can be downloaded without charge from the Social Science Research Network Electronic Paper Collection: http://ssrncom/abstract=382363

On an Extension of Condition Number Theory to Non-Conic Convex Optimization Robert M Freund and Fernando Ordóñez February 14, 2003 Abstract The purpose of this paper is to extend, as much as possible, the modern theory of condition numbers for conic convex optimization: to the more general non-conic format: z := min x c t x st Ax b C Y x C X, GP d ) z := min x c t x st Ax b C Y x P, where P is any closed convex set, not necessarily a cone, which we call the groundset Although any convex problem can be transformed to conic form, such transformations are neither unique nor natural given the natural description of many problems, thereby diminishing the relevance of data-based condition number theory Herein we extend the modern theory of condition numbers to the problem format GP d ) As a byproduct, we are able to state and prove natural extensions of many theorems from the conic-based theory of condition numbers to this broader problem format Key words: Condition number, convex optimization, conic optimization, duality, sensitivity analysis, perturbation theory MIT Sloan School of Management, 50 Memorial Drive, Cambridge, MA 02142, USA, email: rfreund@mitedu Industrial and Systems Engineering, University of Southern California, GER-247, Los Angeles, CA 90089-0193, USA, email: fordon@uscedu 1

1 Introduction The modern theory of condition numbers for convex optimization problems was developed by Renegar in [16] and [17] for convex optimization problems in the following conic format: CP d ) z := min x c t x st Ax b C Y x C X, 1) where C X X and C Y Y are closed convex cones, A is a linear operator from the n-dimensional vector space X to the m-dimensional vector space Y, b Y, and c X the space of linear functionals on X ) The data d for CP d ) is defined as d := A, b, c) The theory of condition numbers for CP d ) focuses on three measures,, and Cd), to bound various behavioral and computational quantities pertaining to CP d ) The quantity is called the distance to primal infeasibility and is the smallest data perturbation d for which CP d+ d ) is infeasible The quantity is called the distance to dual infeasibility for the conic dual CD d ) of CP d ): CD d ) z := max y b t y st c A t y CX y CY, 2) and is defined similarly to but using the conic dual problem instead which conveniently is of the same general conic format as the primal problem) The quantity Cd) is called the condition measure or the condition number of the problem instance d and is a positively) scale-invariant reciprocal of the smallest data perturbation d that will render the perturbed data instance either primal or dual infeasible: Cd) := d min{, }, 3) for a suitably defined norm on the space of data instances d A problem is called ill-posed if min{, } = 0, equivalently Cd) = These three condition measure quantities have been shown in theory to be connected to a wide variety of bounds on behavioral characteristics of CP d ) and its dual, including bounds on sizes of feasible solutions, bounds on sizes of optimal solutions, bounds on optimal objective values, bounds on the sizes and aspect ratios of inscribed balls in the feasible region, bounds on the rate of deformation of the feasible region under perturbation, bounds on changes in optimal objective values under perturbation, and numerical bounds related to the linear algebra computations of certain algorithms, see [16], [5], [4], [6], [7], [8], [21], [19], [22], [20], [14], [15] In the context of interior-point methods for linear and semidefinite optimization, these same three condition measures have also been shown to 2

be connected to various quantities of interest regarding the central trajectory, see [10] and [11] The connection of these condition measures to the complexity of algorithms has been shown in [6], [7], [17], [2], and [3], and some of the references contained therein The conic format CP d ) covers a very general class of convex problems; indeed any convex optimization problem can be transformed to an equivalent instance of CP d ) However, such transformations are not necessarily unique and are sometimes rather unnatural given the natural description and the natural data for the problem The condition number theory developed in the aforementioned literature pertains only to convex optimization problems in conic form, and the relevance of this theory is diminished to the extent that many practical convex optimization problems are not conveyed in conic format Furthermore, the transformation of a problem to conic form can result in dramatically different condition numbers depending on the choice of transformation, see the example in Section 2 of [13] Motivated to overcome these shortcomings, herein we extend the condition number theory to non-conic convex optimization problems We consider the more general format for convex optimization: GP d ) z d) = min c t x st Ax b C Y x P, where P is allowed to be any closed convex set, possibly unbounded, and possibly without interior For example, P could be the solution set of box constraints of the form l x u where some components of l and/or u might be unbounded, or P might be the solution of network flow constraints of the form Nx = g, x 0 And of course, P might also be a closed convex cone We call P the ground-set and we refer to GP d ) as the ground-set model GSM) format We present the definition of the condition number for problem instances of the more general GSM format in Section 2, where we also demonstrate some basic properties A number of results from condition number theory are extended to the GSM format in the subsequent sections of the paper In Section 3 we prove that a problem instance with a finite condition number has primal and dual Slater points, which in turn implies that strong duality holds for the problem instance and its dual In Section 4 we provide characterizations of the condition number as the solution to associated optimization problems In Section 5 we show that if the condition number of a problem instance is finite, then there exist primal and dual interior solutions that have good geometric properties In Section 6 we show that the rate of deformation of primal and dual feasible regions and optimal objective function values due to changes in the data are bounded by functions of the condition number Section 7 contains concluding remarks We now present the notation and general assumptions that we will use throughout the paper 3 4)

Notation and General Assumptions We denote the variable space X by IR n and the constraint space Y by IR m Therefore, P IR n, C Y IR m, A is an m by n real matrix, b IR m, and c IR n The spaces X and Y of linear functionals on IR n and IR m can be identified with IR n and IR m, respectively For v, w IR n or IR m, we write v t w for the standard inner product We denote by D the vector space of all data instances d = A, b, c) A particular data instance is denoted equivalently by d or A, b, c) We define the norm for a data instance d by d := max{ A, b, c }, where the norms x and y on IR n and IR m are given, A denotes the usual operator norm, and denotes the dual norm associated with the norm on IR n or IR m, respectively Let Bv, r) denote the ball centered at v with radius r, using the norm for the space of variables v For a convex cone S, let S denote the positive) dual cone, namely S := {s s t x 0 for all x S} Given a set Q IR n, we denote the closure and relative interior of Q by cl Q and relint Q, respectively We use the convention that if Q is the singleton Q = {q}, then relint Q = Q We adopt the standard conventions 1 = 0 and 1 = 0 We also make the following two general assumptions: Assumption 1 P and C Y Assumption 2 Either C Y IR m or P is not bounded or both) Clearly if either P = or C Y = problem GP d ) is infeasible regardless of A, b, and c Therefore Assumption 1 avoids settings wherein all problem instances are trivially inherently infeasible Assumption 2 is needed to avoid settings where GP d ) is feasible for every d = A, b, c) D This will be explained further in Section 2 2 Condition Numbers for GP d ) and its Dual 21 Distance to Primal Infeasibility We denote the feasible region of GP d ) by: X d := {x IR n Ax b C Y, x P } 5) Let F P := {d D X d }, ie, F P is the set of data instances for which GP d ) has a feasible solution Similar to the conic case, the primal distance to infeasibility, denoted by, is defined as: := inf { d X d+ d = } = inf { d d + d F C P } 6) 4

22 The Dual Problem and Distance to Dual Infeasibility In the case when P is a cone, the conic dual problem 2) is of the same basic format as the primal problem However, when P is not a cone, we must first develop a suitable dual problem, which we do in this subsection Before doing so we introduce a dual pair of cones associated with the ground-set P Define the closed convex cone C by homogenizing P to one higher dimension: C := cl {x, t) IR n IR x tp, t > 0}, 7) and note that C = {x, t) IR n IR x tp, t > 0} R {0}) where R is the recession cone of P, namely R := {v IR n there exists x P for which x + θv P for all θ 0} 8) It is straightforward to show that the positive) dual cone C of C is C := {s, u) IR n IR s t x + u t 0 for all x, t) C} = {s, u) IR n IR s t x + u 0, for all x P } = {s, u) IR n IR inf x P s t x + u 0} 9) The standard Lagrangian dual of GP d ) can be constructed as: which we re-write as: max y C Y max y C Y With the help of 9) we re-write 10) as: inf x P {ct x + b Ax) t y} inf x P {bt y + c A t y) t x} 10) GD d ) z d) = max y,u b t y u st c A t y, u) C y CY 11) We consider the formulation 11) to be the dual problem of 4) The feasible region of GD d ) is: Y d := {y, u) IR m IR c A t y, u) C, y C Y } 12) Let F D := {d D Y d }, ie, F D is the set of data instances for which GD d ) has a feasible solution The dual distance to infeasibility, denoted by, is defined as: := inf { d Y d+ d = } = inf { d d + d F C D } 13) 5

We also present an alternate form of 11), which does not use the auxiliary variable u, based on the function u ) defined by us) := inf x P st x 14) It follows from Theorem 55 in [18] that u ), the support function of the set P, is a convex function The epigraph of u ) is: epi u ) := {s, v) IR n IR v us)}, and the projection of the epigraph onto the space of the variables s is the effective domain of u ): effdom u ) := {s IR n us) < } It then follows from 9) that C = epi u ), and so GD d ) can alternatively be written as: z d) = max y b t y uc A t y) st c A t y effdom u ) y CY 15) Evaluating the inclusion y, u) Y d is not necessarily an easy task, as it involves checking the inclusion c A t y, u) C, and C is an implicitly defined cone A very useful tool for evaluating the inclusion y, u) Y d is given in the following proposition, where recall from 8) that R is the recession cone of P Proposition 1 If y satisfies y CY and c A t y relintr, then uc A t y) is finite, and for all u uc A t y) it holds that y, u) is feasible for GD d ) Proof: Note from Proposition 11 of the Appendix that cl effdom u ) = R and from Proposition 12 of the Appendix that c A t y relintr = relint cl effdom u ) = relint effdom u ) effdom u ) This shows that uc A t y) is finite and c A t y, uc A t y)) C Therefore y, u) is feasible for GD d ) for all u uc A t y) 23 Condition Number A data instance d = A, b, c) is consistent if both the primal and dual problems have feasible solutions Let F denote the set of consistent data instances, namely F := 6

F P F D = {d D X d and Y d } For d F, the distance to infeasibility is defined as: ρd) := min {, } 16) = inf { d X d+ d = or Y d+ d = }, the interpretation being that ρd) is the size of the smallest perturbation of d which will render the perturbed problem instance either primal or dual infeasible The condition number of the instance d is defined as d ρd) > 0 Cd) := ρd) ρd) = 0, which is a positive) scale-invariant reciprocal of the distance to infeasibility This definition of condition number for convex optimization problems was first introduced by Renegar for problems in conic form in [16] and [17] 24 Basic Properties of,, and Cd), and Alternative Duality Results The need for Assumptions 1 and 2 is demonstrated by the following: Proposition 2 For any data instance d D, 1 = if and only if C Y = IR m 2 = if and only if P is bounded The proof of this proposition relies on Lemmas 1 and 2, which are versions of theorems of the alternative for primal and dual feasibility of GP d ) and GD d ) These two lemmas are stated and proved at the end of this section Proof of Proposition 2: Clearly C Y = IR m implies that = Also, if P is bounded, then R = {0} and R = IR n, whereby from Proposition 1 we have that GD d ) is feasible for any d, and so = Therefore for both items it only remains to prove the converse implication Recall that we denote d = A, b, c) Assume that =, and suppose that C Y IR m Then CY {0}, and consider a point ỹ CY, ỹ 0 Define the perturbation d = A, b, c) = A, b + ỹ, c) and d = d + d Then the point y, u) = ) ỹ, ỹt ỹ 2 satisfies the alternative system A2 d) of Lemma 1 for the data d = 0, ỹ, 0), whereby X d = Therefore d d =, a contradiction, and so C Y = IR m 7

Now assume that =, and suppose that P is not bounded, and so R {0} Consider x R, x 0, and define the perturbation d = A, b, c x) Then the point x satisfies the alternative system B2 d) of Lemma 2 for the data d = d + d = 0, 0, x), whereby Y d = Therefore d d =, a contradiction, and so P is bounded Remark 1 If d F, then Cd) 1 Proof: Consider the data instance d 0 = 0, 0, 0) Note that X d0 = P and Y d0 = CY IR +, therefore d 0 F If C Y IR m, consider b IR m \ C Y, b 0, and for any ε > 0 define the instance d ε = 0, εb, 0) This instance is such that for any ε > 0, X dε =, which means that d ε FP C and therefore inf ε>0 d d ε d If C Y = IR m, then Assumption 2 implies that P is unbounded This means that there exists a ray r R, r 0 For any ε > 0 the instance d ε = 0, 0, εr) is such that Y dε =, which means that d ε FD C and therefore inf ε>0 d d ε d In each case we have ρd) = min{, } d, which implies the result The following two lemmas present weak and strong alternative results for GP d ) and GD d ), and are used in the proofs of Proposition 2 and elsewhere Lemma 1 Consider the following systems with data d = A, b, c): X d ) Ax b C Y x P, A1 d ) A t y, u) C b t y u y 0 y C Y, A2 d ) A t y, u) C b t y > u y C Y If system X d ) is infeasible, then system A1 d ) is feasible A2 d ) is feasible, then system X d ) is infeasible Conversely, if system Proof: Assume that system X d ) is infeasible This implies that b S := {Ax v x P, v C Y }, which is a nonempty convex set Using Proposition 10 we can separate b from S and therefore there exists y 0 such that y t Ax v) y t b for all x P, v C Y Set u := y t b, then the inequality implies that y C Y and that A t y) t x + u 0 for any x P Therefore A t y, u) C and y, u) satisfies system A1 d ) Conversely, if both A2 d ) and X d ) are feasible then 0 y t Ax b) = A t y) t x b t y < A t y) t x + u ) 0 8

Lemma 2 Consider the following systems with data d = A, b, c): Y d ) c A t y, u) C y C Y, B1 d ) Ax C Y c t x 0 x 0 x R, B2 d ) Ax C Y c t x < 0 x R If system Y d ) is infeasible, then system B1 d ) is feasible Conversely, if system B2 d ) is feasible, then system Y d ) is infeasible Proof: Assume that system Y d ) is infeasible, this implies that 0, 0, 0) S := { s, v, q) y, u st c A t y, u) + s, v) C, y + q C Y }, which is a nonempty convex set Using Proposition 10 we separate the point 0, 0, 0) from S and therefore there exists x, δ, z) 0 such that x t s + δv + z t q 0 for all s, v, q) S For any y, u), s, ṽ) C, and q C Y, define s = c A t y) + s, v = u + ṽ, and q = y + q By construction s, v, q) S and therefore for any y, u, s, ṽ) C, q C Y we have x t c + Ax z) t y + x t s δu + δṽ + z t q 0 The above inequality implies that δ = 0, Ax = z C Y, x R, and c t x 0 In addition x 0, because otherwise x, δ, z) = x, 0, Ax) = 0 Therefore B1 d ) is feasible Conversely, if both B2 d ) and Y d ) are feasible then 0 x t c A t y) = c t x y t Ax < y t Ax 0 3 Slater Points, Distance to Infeasibility, and Strong Duality In this section we prove that the existence of a Slater point in either GP d ) or GD d ) is sufficient to guarantee that strong duality holds for these problems We then show that a positive distance to infeasibility implies the existence of Slater points, and use these results to show that strong duality holds whenever > 0 or > 0 We first state a weak duality result Proposition 3 Weak duality holds between GP d ) and GD d ), that is, z d) z d) Proof: Consider x and y, u) feasible for GP d ) and GD d ), respectively Then 0 c A t y ) t x + u = c t x y t Ax + u c t x b t y + u, 9

where the last inequality follows from y t Ax b) 0 Therefore z d) z d) A classic constraint qualification in the history of constrained optimization is the existence of a Slater point in the feasible region, see for example Theorem 304 of [18] or Chapter 5 of [1] We now define a Slater point for problems in the GSM format Definition 1 A point x is a Slater point for problem GP d ) if x relintp and Ax b relintc Y A point y, u) is a Slater point for problem GD d ) if y relintcy and c A t y, u) relintc We now present the statements of the main results of this section, deferring the proofs to the end of the section The following two theorems show that the existence of a Slater point in the primal or dual is sufficient to guarantee strong duality as well as attainment in the dual or the primal problem, respectively Theorem 1 If x is a Slater point for problem GP d ), then z d) = z d) If in addition z d) >, then Y d and problem GD d ) attains its optimum Theorem 2 If y, u ) is a Slater point for problem GD d ), then z d) = z d) If in addition z d) <, then X d and problem GP d ) attains its optimum The next three results show that a positive distance to infeasibility is sufficient to guarantee the existence of Slater point for the primal and the dual problems, respectively, and hence is sufficient to ensure that strong duality holds The fact that a positive distance to infeasibility implies the existence of an interior point in the feasible region is shown for the conic case in Theorems 15, 17, and 19 in [8] and Theorem 31 in [17] Theorem 3 Suppose that > 0 Then there exists a Slater point for GP d ) Theorem 4 Suppose that > 0 Then there exists a Slater point for GD d ) Corollary 1 Strong Duality) If > 0 or > 0, then z d) = z d) ρd) > 0, then both the primal and the dual attain their respective optimal values If Proof: The proof of this result is a straightforward consequence of Theorems 1, 2, 3, and 4 10

Note that the contrapositive of Corollary 1 says that if d F and z d) > z d), then = = 0 and so ρd) = 0 In other words, if a data instance d is primal and dual feasible but has a positive optimal duality gap, then d must necessarily be arbitrarily close to being both primal infeasible and dual infeasible Proof of Theorem 1: For simplicity, let z and z denote the primal and dual optimal objective values, respectively The interesting case is when z >, otherwise weak duality implies that GD d ) is infeasible and z = z = If z > the point 0, 0, 0) does not belong to the non-empty convex set S := { p, q, α) x st x + p P, Ax b + q C Y, c t x α < z } We use Proposition 10 to properly separate 0, 0, 0) from S, which implies that there exists γ, y, π) 0 such that γ t p + y t q + πα 0 for all p, q, α) S Note that π 0 because α is not upper bounded in the definition of S If π > 0, re-scale γ, y, π) such that π = 1 For any x IR n, p P, q C Y, and ε > 0 define p = x + p, q = Ax + b + q, and α = c t x z + ε By construction the point p, q, α) S and the proper separation implies that for all x, p P, q C Y, and ε > 0 0 γ t x + p) + y t Ax + b + q) + c t x z + ε = A t y + c γ) t x + γ t p + y t q + y t b z + ε This expression implies that c A t y = γ, y C Y, and c A t y, u) C for u := y t b z Therefore y, u) is feasible for GD d ) and z b t y u = b t y y t b+z = z z, which implies that z = z and the dual feasible point y, u) attains the dual optimum If π = 0, the same construction used above and proper separation gives the following inequality for all x, p P, and q C Y 0 γ t x + p) + y t Ax + b + q) = A t y γ) t x + γ t p + y t q + y t b This implies that A t y = γ and y C Y, which implies that y t A p + y t q + y t b 0 for any p P, q C Y Proper separation also guarantees that there exists ˆp, ˆq, ˆα) S such that γ tˆp + y tˆq + π ˆα = y t Aˆp + y tˆq > 0 Let x be the Slater point of GP d ) and ˆx such that ˆx + ˆp P, Aˆx b + ˆq C Y, and c tˆx ˆα < z For all ξ sufficiently small, x + ξˆx + ˆp x ) P and Ax b + ξaˆx b + ˆq Ax b)) C Y Therefore 0 y t A x + ξˆx + ˆp x )) + y t Ax b + ξaˆx b + ˆq Ax b))) + y t b = ξ y t Aˆx y t Aˆp + y t Ax + y t Aˆx y t b + y tˆq y t Ax + y t b ) = ξ y t Aˆp + y tˆq ), 11

a contradiction, since ξ can be negative and y t Aˆp + y tˆq > 0 Therefore π 0, completing the proof Proof of Theorem 2: For simplicity, let z and z denote the primal and dual optimal objective values respectively The interesting case is when z <, otherwise weak duality implies that GP d ) is infeasible and z = z = If z < the point 0, 0, 0, 0) does not belong to the non-empty convex set S := { s, v, q, α) y, u st c A t y, u) + s, v) C, y + q C Y, b t y u + α > z } We use Proposition 10 to properly separate 0, 0, 0, 0) from S, which implies that there exists x, β, γ, δ) 0 such that x t s + βv + γ t q + δα 0 for all s, v, q, α) S Note that δ 0 because α is not upper bounded in the definition of S If δ > 0, re-scale x, β, γ, δ) such that δ = 1 For any y IR m, u IR, s, ṽ) C, q C Y, and ε > 0, define s = c+a t y+ s, v = u+ṽ, q = y+ q, and α = z b t y+u+ε By construction the point s, v, q, α) S and proper separation implies that for all y, u, s, ṽ) C, q C Y, and ε > 0 0 x t c + A t y + s) + β u + ṽ) + γ t y + q) + z b t y + u + ε = Ax b γ) t y + x, β) t s, ṽ) + 1 β)u + γ t q c t x + z + ε This implies that Ax b = γ C Y, β = 1, c t x z, and x, 1) C, which means that x P Therefore x is feasible for GP d ) and z c t x z z, which implies that z = z and the primal feasible point x attains the optimum If δ = 0, the same construction used above and proper separation gives the following inequality for all y, u, s, ṽ) C, q C Y 0 x t c + A t y + s) + β u + ṽ) + γ t y + q) = Ax γ) t y + x, β) t s, ṽ) βu + γ t q c t x This implies that Ax = γ C Y, β = 0, which means that x t s + x t A t q c t x 0 for any s, ũ) C and q C Y The proper separation also guarantees that there exists ŝ, ˆv, ˆq, ˆα) S such that x t ŝ + βˆv + γ tˆq = x t ŝ + x t A tˆq > 0 Let y, u ) be the Slater point of GD d ) and ŷ, û) such that c A t ŷ +ŝ, û+ ˆv) C, ŷ + ˆq CY, and b t ŷ û + ˆα > z Then for all ξ sufficiently small, we have that y + ξ ŷ + ˆq y ) CY and c A t y + ξ c A t ŷ + ŝ c + A t y ), u + ξ û + ˆv u ) ) C Therefore x t c A t y + ξ c A t ŷ + ŝ c + A t y )) + x t A t y + ξ ŷ + ˆq y )) c t x 0 12

Simplifying and canceling, we obtain 0 ξ x t A t ŷ + x t ŝ + x t A t y + x t A t ŷ + x t A tˆq x t A t y ) = ξ x t ŝ + x t A tˆq ), a contradiction, since ξ can be negative and x t ŝ+x t A tˆq > 0 Therefore δ 0, completing the proof Proof of Theorem 3: Equation 6) and > 0 imply that X d Assume that X d contains no Slater point, then relintc Y {Ax b x relintp } = and these nonempty convex sets can be separated using Proposition 10 Therefore there exists y 0 such that for any s C Y, x P we have y t s y t Ax b) Let u = y t b; from the inequality above we have that y CY and y t Ax + u 0 for any x P, which implies that A t y, u) C Define b ε = b + ε y ŷ, with ŷ given by Proposition 9 such that ŷ = 1 and ŷ t y = y Then the point y, u) is feasible for Problem A2 dε ) of Lemma 1 with data d ε = A, b ε, c) for any ε > 0 This implies that X dε = and therefore inf ε>0 d d ε = inf ε>0 = 0, a contradiction Proof of Theorem 4: Equation 13) and > 0 imply that Y d Assume that Y d contains no Slater point Consider the nonempty convex set S defined by: ε y S := { c A t y, u ) y relintc Y, u IR } No Slater point in the dual implies that relintc S = Therefore we can properly separate these two nonempty convex sets using Proposition 10, whereby there exists x, t) 0 such that for any s, v) C, y C Y, u IR we have x t s + tv x t c A t y ) + tu The above inequality implies that Ax C Y, c t x 0, x, t) C, and t = 0 This last fact implies that x 0 and x R Let ˆx be such that ˆx = 1 and ˆx t x = x see Proposition 9) For any ε > 0, define c ε = c ε ˆx Then the point x is feasible for x Problem B2 dε ) of Lemma 2 with data d ε = A, b, c ε ) This implies then that Y dε = and consequently inf ε>0 d d ε = inf ε>0 = 0, a contradiction The contrapositives of Theorems 3 and 4 are not true Consider for example the data [ ] ) ) 0 0 1 1 A =, b =, and c =, 0 0 0 0 and the sets C Y = IR + {0} and P = C X = IR + IR Problem GP d ) for this example has a Slater point at 1, 0), and = 0 perturbing by b = 0, ε) makes the problem infeasible for any ε) Problem GD d ) for the same example has a Slater point at 1, 0) and = 0 perturbing by c = 0, ε) makes the problem infeasible for any ε) 13 ε x

4 Characterization of and via Associated Optimization Problems Equation 16) shows that to characterize ρd) for consistent data instances d F, it is sufficient to express and in a convenient form Below we show that these distances to infeasibility can be obtained as the solutions of certain associated optimization problems These results can be viewed as an extension to problems not in conic form of Theorem 35 of [17], and Theorems 1 and 2 of [8] Theorem 5 Suppose that X d Then = j P d) = r P d), where and j P d) = min max { A t y + s, b t y u } y = 1 y CY 17) s, u) C r P d) = min max θ v 1 Ax bt vθ C Y v IR m x + t 1 x, t) C 18) Theorem 6 Suppose that Y d Then = j D d) = r D d), where j D d) = min max { Ax p, c t x + g } x = 1 x R p C Y g 0 19) and r D d) = min max θ v 1 A t y + cδ θv R v IR n y + δ 1 y CY δ 0 20) Proof of Theorem 5: Assume that j P d) >, then there exists a data instance d = Ā, b, c) that is primal infeasible and A Ā < j P d), b b < j P d), and 14

c c < j P d) From Lemma 1 there is a point ȳ, ū) that satisfies the following: Āt ȳ, ū) C btȳ ū ȳ 0 ȳ C Y Scale ȳ such that ȳ = 1, then y, s, u) = ȳ, Āt ȳ, b t ȳ) is feasible for 17) and A t y + s = A t ȳ Āt ȳ A Ā ȳ < j P d) b t y u = b t ȳ b t ȳ b b ȳ < j P d) In the first inequality above we used the fact that A t = A Therefore j P d) max { A t y + s, b t y u } < j P d), a contradiction Let us now assume that j P d) < γ < for some γ This means that there exists ȳ, s, ū) such that ȳ C Y, ȳ = 1, s, ū) C, and that A t ȳ + s < γ, b t ȳ ū < γ From Proposition 9, consider ŷ such that ŷ = 1 and ŷ t ȳ = ȳ = 1, and define, for ε > 0, Ā = A ŷ A t ȳ) t + s t ) bε = b ŷ b t ȳ ū ε) We have that ȳ C Y, Āt ȳ = s, b t εȳ = ū + ε > ū, and Āt ȳ, ū) C This implies that for any ε > 0, the problem A2 dε ) in Lemma 1 is feasible with data d ε = Ā, b ε, c) Lemma 1 then implies that X dε = and therefore d d ε To finish the proof we compute the size of the perturbation: A Ā = ŷ A t ȳ) t + s t) A t ȳ + s ŷ < γ b b ε = b t ȳ ū ε ŷ b t ȳ ū + ε < γ + ε, which implies, d d ε = max { A Ā, b b ε } < γ +ε <, for ε small enough This is a contradiction, whereby j P d) = To prove the other characterization, we note that θ 0 in Problem 18) and invoke Lemma 6 to rewrite it as r P d) = min min max { A t y + s, b t y + u } v 1 y t v 1 v IR m y CY s, u) C 15

The above problem can be written as the following equivalent optimization problem: r P d) = min max { A t y + s, b t y + u } y 1 y C Y s, u) C The equivalence of these problems is verified by combining the minimization operations in the first problem and using the Cauchy-Schwartz inequality The converse makes use of Proposition 9 To finish the proof, we note that if y, s, u) is optimal for this last problem then it also satisfies y = 1, whereby making it equivalent to 17) Therefore r P d) = min max { A t y + s, b t y + u } = j P d) y = 1 y C Y s, u) C Proof of Theorem 6: Assume that j D d) >, then there exists a data instance d = Ā, b, c) that is dual infeasible and A Ā < j Dd), b b < j D d), and c c < j D d) From Lemma 2 there exists x R such that x 0, Ā x C Y, and c t x 0 We can scale x such that x = 1 Then x, p, g) = x, Ā x, ct x) is feasible for 19), and Ax p = Ax Āx A Ā x < j Dd) c t x + g = c t x c t x c c x < j D d) Therefore, j D d) max { Ax p, c t x + g } < j D d), which is a contradiction Assume now that j D d) < γ < for some γ Then there exists x, p, ḡ) such that x R, x = 1, p C Y, and ḡ 0, and that A x p γ and c t x + ḡ γ From Proposition 9, consider ˆx such that ˆx = 1 and ˆx t x = x = 1, and define: Ā = A A x p) ˆx t and c ε = c ˆxc t x+ḡ +ε), for ε > 0 By construction Ā x = p C Y and c t ε x = ḡ ε < 0, for any ε > 0 Therefore Problem B2 dε ) in Lemma 2 is feasible for data d ε = Ā, b, c ε), which implies that Y dε = We can then bound as follows: d d ε = max { A x p) ˆx t, ˆxc t x + ḡ + ε) } max {γ, γ + ε} = γ + ε < for ε small enough, which is a contradiction Therefore = j D d) To prove the other characterization, we note that θ 0 in Problem 20) and invoke Lemma 6 to rewrite it as r D d) = min min max { Ax + p, c t x + g } v 1 x t v 1 v IR n x R p C Y g 0 16

The above problem can be written as the following equivalent optimization problem: r D d) = min max { Ax + p, c t x + g } x 1 x R p C Y g 0 The equivalence of these problems is verified by combining the minimization operations in the first problem and using the Cauchy-Schwartz inequality The converse makes use of Proposition 9 To finish the proof, we note that if x, p, g) is optimal for this last problem then it also satisfies x = 1, whereby making it equivalent to 19) Therefore r D d) = min max { Ax + p, c t x + g } = j D d) x = 1 x R p C Y g 0 5 Geometric Properties of the Primal and Dual Feasible Regions In Section 3 we showed that a positive primal and/or dual distance to infeasibility implies the existence of a primal and/or dual Slater point, respectively We now show that a positive distance to infeasibility also implies that the corresponding feasible region has a reliable solution We consider a solution in the relative interior of the feasible region to be a reliable solution if it has good geometric properties: it is not too far from a given reference point, its distance to the relative boundary of the feasible region is not too small, and the ratio of these two quantities is not too large, where these quantities are bounded by appropriate condition numbers 51 Distance to Relative Boundary, Minimum Width of Cone An affine set T is the translation of a vector subspace L, ie, T = a + L for some a The minimal affine set that contains a given set S is known as the affine hull of S We denote the affine hull of S by L S ; it is characterized as: { L S = α i x i α i IR, x i S, i I } α i = 1, I a finite set, i I 17

see Section 1 in [18] We denote by L S the vector subspace obtained when the affine hull L S is translated to contain the origin; ie for any x S, L S = L S x Note that if 0 S then L S is a subspace Many results in this section involve the distance of a point x S to the relative boundary of the set S, denoted by distx, rel S), defined as follows: Definition 2 Given a non-empty set S and a point x S, the distance from x to the relative boundary of S is distx, rel S) := inf x x x st x L S \ S 21) Note that if S is an affine set and in particular if S is the singleton S = {s}), then distx, rel S) = for each x S We use the following definition of the min-width of a convex cone: Definition 3 For a convex cone K, the min-width of K is defined by { } disty, rel K) τ K := sup y K, y 0, y for K {0}, and τ K := if K = {0} The measure τ K maximizes the ratio of the radius of a ball contained in the relative interior of K and the norm of its center, and so it intuitively corresponds to half of the vertex angle of the widest cylindrical cone contained in K The quantity τ K was called the inner measure of K for Euclidean norms in Goffin [9], and has been used more recently for general norms in analyzing condition measures for conic convex optimization, see [6] Note that if K is not a subspace, then τ K 0, 1], and τ K is attained for some y 0 relintk satisfying y 0 = 1, as well as along the ray αy 0 for all α > 0; and τ K takes on larger values to the extent that K has larger minimum width If K is a subspace, then τ K = 52 Geometric Properties of the Feasible Region of GP d In this subsection we present results concerning geometric properties of the feasible region X d of GP d ) We defer all proofs to the end of the subsection The following proposition is an extension of Lemma 32 of [16] to the ground-set model format 18

Proposition 4 Consider any x = ˆx + r feasible for GP d ) such that ˆx P and r R If > 0 then r 1 max { Aˆx b, c t r } The following result is an extension of Assertion 1 of Theorem 11 of [16] to the ground-set model format of GP d ): Proposition 5 Consider any x 0 P If > 0 then there exists x X d satisfying x x 0 distax0 b, C Y ) max { 1, x 0 } The following is the main result of this subsection, and can be viewed as an extension of Theorems 15, 17, and 19 of [8] to the ground-set model format of GP d ) In Theorem 7 we assume for expository convenience that P is not an affine set and C Y is not a subspace These assumptions are relaxed in Theorem 8 Theorem 7 Suppose that P is not an affine set, C Y is not a subspace, and consider any x 0 P If > 0 then there exists x X d satisfying: 1 a) x x 0 Ax0 b + A max{1, x 0 } b) x x 0 + Ax0 b + A ) 1 2 a) dist x, rel P ) 1 1 + Ax0 b + A distx 0, rel P ) b) 1 dist x, rel X d ) 1 min {distx 0, rel P ), τ CY } 1 + Ax0 b + A x x 0 3 a) dist x, rel P ) 1 Ax 0 ) b + A max{1, x 0 } distx 0, rel P ) x x 0 b) dist x, rel X d ) 1 Ax 0 ) b + A max{1, x 0 } min {distx 0, rel P ), τ CY } ) x c) dist x, rel P ) 1 x 0 + Ax0 b + A distx 0, rel P ) ) 19

d) ) x dist x, rel X d ) 1 x 0 + Ax0 b + A min {distx 0, rel P ), τ CY } The statement of Theorem 8 below relaxes the assumptions on P and C Y affine and/or linear spaces: not being Theorem 8 Consider any x 0 P If > 0 then there exists x X d with the following properties: If P is not an affine set, x satisfies all items of Theorem 7 If P is an affine set and C Y is not a subspace, x satisfies all items of Theorem 7, where items 2a), 3a), and 3c) are vacuously valid as both sides of these inequalities are zero If P is an affine set and C Y is a subspace, x satisfies all items of Theorem 7, where items 2a), 2b), 3a), 3b), 3c), and 3d) are vacuously valid as both sides of these inequalities are zero We conclude this subsection by presenting a result which captures the thrust of Theorems 7 and 8, emphasizing how the distance to infeasibility and the geometric properties of a given point x 0 P bound various geometric properties of the feasible region X d For x 0 P, define the following measure: g P,CY x 0 ) := max{ x 0, 1} min{1, distx 0, rel P ), τ CY } Also define the following geometric measure of the feasible region X d : g Xd := min x X d max { x, } x distx, rel X d ), 1 distx, rel X d ) The following is an immediate consequence of Theorems 7 and 8 Corollary 2 Consider any x 0 P If > 0 then g Xd g P,CY x 0 ) 1 + Ax0 b + A ) 20

We now proceed with proofs of these results Proof of Proposition 4: If r = 0 the result is true If r 0, then Proposition 9 shows that there exists ˆr such that ˆr = 1 and ˆr t r = r For any ε > 0 define the following perturbed problem instance: Ā = A + 1 r Aˆx b) c ˆrt, b t r) + ε = b, c = c + ˆr r Note that, for the data d = Ā, b, c), the point r satisfies B2 d) in Lemma 2, and therefore GD d) is infeasible We conclude that d d, which implies max { Aˆx b, ct r) + + ε} r and so max { Aˆx b, ct r} r The following technical lemma, which concerns the optimization problem P P ) below, is used in the subsequent proofs Problem P P ) is parametrized by given points x 0 P and w 0 C Y, and is defined by P P ) max x,t,w,θ θ st Ax bt w = θ b Ax 0 + w 0 ) x + t 1 x, t) C w C Y 22) Lemma 3 Consider any x 0 P and w 0 C Y such that Ax 0 w 0 b If > 0, then there exists a point x, t, w, θ) feasible for problem P P ) that satisfies θ b Ax 0 + w 0 > 0 23) Proof: Note that problem P P ) is feasible for any x 0 and w 0 since x, t, w, θ) = 0, 0, 0, 0) is always feasible, therefore it can either be unbounded or have a finite optimal objective value If P P ) is unbounded, we can find feasible points with an objective function large enough such that 23) holds If P P ) has a finite optimal value, say θ, then it follows from elementary arguments that it attains its optimal value Since > 0 implies X d, Theorem 5 implies that the optimal solution x, t, w, θ ) for P P ) satisfies 23) Proof of Proposition 5: Assume Ax 0 b C Y, otherwise x = x 0 satisfies the proposition We consider problem P P ), defined by 22), with x 0 and w 0 C Y such 21

that Ax 0 b w 0 = distax 0 b, C Y ) From Lemma 3 we have that there exists a point x, t, w, θ) feasible for P P ) that satisfies Define θ b Ax 0 + w 0 = distax 0 b, C Y ) x = x + θx0 and w = w + θw0 t + θ t + θ By construction we have x P, A x b = w C Y, therefore x X d, and x x 0 = x tx0 t + θ x + t) max{1, x0 } θ distax0 b, C Y ) max { 1, x 0 } Proof of Theorem 7: Note that > 0 implies X d ; note also that is finite, otherwise Proposition 2 shows that C Y = IR m which is a subspace Set w 0 C Y such that w 0 = A and τ CY = distw0,rel C Y ) We also assume that Ax 0 b w 0, otherwise w 0 we can show that x = x 0 satisfies the theorem Let r w 0 = distw 0, rel C Y ) = A τ CY and let also r x 0 = distx 0, rel P ) We invoke Lemma 3 with x 0 and w 0 above to obtain a point x, t, w, θ), feasible for P P ) and that from inequality 23) satisfies Define the following: 0 < 1 θ Ax0 b + A 24) x = x + θx0 t + θ, w + θw0 w = t + θ, r x = θr x0 t + θ, r w = θτ C Y t + θ By construction dist x, rel P ) r x, dist w, rel C Y ) r w A, and A x b = w C Y Therefore the point x X d We now bound its distance to the relative boundary of the feasible region Consider any v L P {y Ay L CY } such that v 1, then x + αv P, for any α r x, and A x + αv) b = w + αav) C Y, for any α r w Therefore x + αv) X d for any α min {r x, r w }, and the distance to the relative boundary of X d is then dist x, rel X d ) α v α, for any α min {r x, r w } Therefore dist x, rel X d ) min {r x, r w } θ min{r x 0,τ CY } t+θ To finish the proof, we just have to bound the different expressions from the statement of the theorem; here we make use of inequality 24): 22

1 a) x x 0 = x tx0 t + θ 1 θ max{1, x0 } Ax0 b + A max{1, x 0 } b) x 1 θ x + x0 1 θ + x0 x 0 + Ax0 b + A 1 2 a) dist x, rel P ) 1 = t + θ 1 1 + 1 ) 1 ) 1 + Ax0 b + A r x θr x 0 r x 0 θ r x 0 1 b) dist x, rel X d ) 1 t + θ 1 1 + 1 ) min{r x 0, τ CY } θ min{r x 0, τ CY } θ ) 1 1 + Ax0 b + A min{r x 0, τ CY } 3 a) b) c) d) x x 0 dist x, rel P ) x tx0 θr x 0 1 r x 0 x x 0 dist x, rel X d ) 1 r x 0 Ax 0 b + A 1 θ max{1, x0 } x tx 0 θ min{r x 0, τ CY } 1 1 min{r x 0, τ CY } x dist x, rel P ) x + θx0 θr x 0 1 r x 0 x dist x, rel X d ) 1 r x 0 max{1, x 0 } 1 min{r x 0, τ CY } θ max{1, x0 } Ax 0 b + A max{1, x 0 } x 0 + 1 ) θ x 0 + Ax0 b + A x + θx0 θ min{r x 0, τ CY } 1 1 min{r x 0, τ CY } ) x 0 + 1 ) min{r x 0, τ CY } θ ) x 0 + Ax0 b + A Finally, we note that Theorem 8 can be proved using almost identical arguments as in the proof of Theorem 7, but with a careful analysis to handle the special cases when P is an affine set or C Y is a subspace, see [12] for exact details 53 Solutions in the relative interior of Y d In this subsection we present results concerning geometric properties of the dual feasible region Y d of GD d ) We defer all proofs to the end of the subsection Before proceeding, 23

we first discuss norms that arise when studying the dual problem Motivated quite naturally by 18), we define the norm x, t) := x + t for points x, t) C IR n IR This then leads to the following dual norm for points s, u) C IR n IR: s, u) := max{ s, u } 25) Consistent with the characterization of given by 20) in Theorem 6, we define the following dual norm for points y, δ) IR m IR: y, δ) := y + δ 26) It is clear that the above defines a norm on the vector space IR m IR which contains Y d The following proposition bounds the norm of the y component of the dual feasible solution y, u) in terms of the objective function value b t y u; it corresponds to Lemma 31 of [16] for the ground-set model format Proposition 6 Consider any y, u) feasible for GD d ) If > 0 then y max { c, b t y u)} The following result corresponds to Assertion 1 of Theorem 11 of [16] for the groundset model format dual problem GD d ): Proposition 7 Consider any y 0 C Y If > 0 then for any ε > 0, there exists ȳ, ū) Y d satisfying ȳ y 0 distc At y 0, R ) + ε max { 1, y 0 } The following is the main result of this subsection, and can be viewed as an extension of Theorems 15, 17, and 19 of [8] to the dual problem GD d ) In Theorem 9 we assume for expository convenience that C Y is not a subspace and that R the recession cone of P ) is not a subspace These assumptions are relaxed in Theorem 10 Theorem 9 Suppose that R and C Y are not subspaces and consider any y 0 CY If > 0 then for any ε > 0, there exists ȳ, ū) Y d satisfying: 1 a) ȳ y 0 c At y 0 + A max{1, y 0 } 24

b) ȳ y 0 + c At y 0 + A 1 2 a) distȳ, rel CY ) 1 1 + c At y 0 ) + A disty 0, rel CY ) 1 b) distȳ, ū), rel Y d ) 1 + ε) max{1, A } 1 + c At y 0 ) + A min {disty 0, rel CY ), τ R } ȳ y 0 3 a) distȳ, rel CY ) 1 c A t y 0 ) + A max{1, y 0 disty 0, rel CY } ) b) c) d) ȳ y 0 distȳ, ū), rel Y d ) 1 + ε) max{1, A } c A t y 0 ) + A max{1, y 0 min {disty 0, rel CY } ), τ R } ȳ distȳ, rel CY ) 1 y 0 disty 0, rel CY + c At y 0 ) + A ) ȳ distȳ, ū), rel Y d ) 1 + ε) max{1, A } y 0 min {disty 0, rel CY + c At y 0 ) + A ), τ R } The statement of Theorem 10 below relaxes the assumptions on R and C Y not being linear subspaces: Theorem 10 Consider any y 0 C Y If > 0 then for any ε > 0, there exists ȳ, ū) Y d with the following properties: If C Y is not a subspace, ȳ, ū) satisfies all items of Theorem 9 If C Y is a subspace and R is not a subspace, ȳ, ū) satisfies all items of Theorem 9, where items 2a), 3a), and 3c) are vacuously valid as both sides of these inequalities are zero If C Y and R are subspaces, ȳ, ū) satisfies items 1a), 1b), 2a), 3a), and 3c) of Theorem 9, where items 2a), 3a), and 3c) are vacuously valid as both sides of these inequalities are zero The point ȳ, ū) also satisfies 2 b) 1 distȳ, ū), rel Y d ) ε 3 b) ȳ y 0 distȳ, ū), rel Y d ) ε 25

3 d) ȳ distȳ, ū), rel Y d ) ε We conclude this subsection by presenting a result which captures the thrust of Theorems 9 and 10, emphasizing how the distance to dual infeasibility and the geometric properties of a given point y 0 C Y bound various geometric properties of the dual feasible region Y d For y 0 relintc Y, define: g C Y,R y0 ) := max{ y 0, 1} min{1, disty 0, rel C Y ), τ R } We now define a geometric measure for the dual feasible region We do not consider the whole set Y d ; instead we consider only the projection onto the variables y Let ΠY d denote the projection of Y d onto the space of the y variables: ΠY d := {y IR m there exists u IR for which y, u) Y d } 27) Note that the set ΠY d corresponds exactly to the feasible region in the alternate formulation of the dual problem 15) We define the following geometric measure of the set ΠY d : { g Yd := inf max y,u) Y d y, } y disty, rel ΠY d ), 1 disty, rel ΠY d ) Corollary 3 Consider any y 0 CY If > 0 then g Yd max{1, A }g C Y,R y0 ) 1 + c At y 0 ) + A Proof: We show in Lemma 4, item 4, that for any ȳ, ū) Y d, distȳ, rel ΠY d ) distȳ, ū), rel Y d ) If either C Y or R is not a subspace, use items 1b), 2b), and 3d) from Theorem 9 and apply the definition of g Yd to obtain g Yd 1 + ε) max{1, A }g C Y,R y0 ) 1 + c At y 0 ) + A Since now the left side is independent of ε, take the limit as ε 0 If both C Y are subspaces we obtain the stronger bound g Yd g C Y,R y0 ) 1 + c At y 0 ) + A and R by using item 1b) from Theorem 9, items 2 b) and 3 d) from Theorem 10, and the definition of g Yd 26

We now state Lemma 4, we start by defining the following set: Ỹ d := { y, u) IR m IR c A t y, u) C } 28) Note that the dual feasible region Y d is recovered from Ỹd as Y d = Ỹd C Y IR) The following lemma, whose proof is deferred to the Appendix, relates a variety of distances to relative boundaries of sets arising in the dual problem: Lemma 4 Given a dual feasible point y, u) Y d, let s = c A t y effdom u ) Then: 1 dist y, u), rel CY IR)) = dist y, rel CY ) 2 dist ) y, u), rel Ỹd 1 dists, u), max{1, A } rel C ) 3 dist y, u), rel Y d ) 1 max{1, A } min {dist s, u), rel C ), dist y, rel C Y )} 4 disty, rel ΠY d ) disty, u), rel Y d ) We now proceed with the proofs of the results of this subsection Proof of Proposition 6: If y = 0 the result is true If y 0, then Proposition 9 shows that there exists ŷ such that ŷ = 1 and ŷ t y = y For any ε > 0, define the following perturbed problem instance: Ā = A 1 y ŷc t, b = b + b t y + u) + + ε) y ŷ, c = c We note that, for the data d = Ā, b, c), the point y, u) satisfies A2 d) in Lemma 1, and therefore GP d) is infeasible We conclude that d d, which implies and so max { c, b t y + u) + + ε} y max { c, b t y u)} y The following technical lemma, which concerns the optimization problem DP ) below, is used in the subsequent proofs Problem DP ) is parameterized by given points y 0 CY and s 0 R, and is defined by DP ) max y,δ,s,θ θ st A t y + δc s = θ A t y 0 c + s 0 ) y + δ 1 y CY δ 0 s R 29) 27

Lemma 5 Consider any y 0 C Y and s 0 R such that A t y 0 + s 0 c If > 0, then there exists a point y, δ, s, θ) feasible for problem DP ) that satisfies θ c A t y 0 s 0 > 0 30) Proof: Note that problem DP ) is feasible for any y 0 and s 0 since y, δ, s, θ) = 0, 0, 0, 0) is always feasible Therefore it can either be unbounded or have a finite optimal objective value If DP ) is unbounded, we can find feasible points with an objective function large enough such that 30) holds If DP ) has a finite optimal value, say θ, then it follows from elementary arguments that it attains this value Since > 0 implies Y d, Theorem 6 implies that the optimal solution y, δ, s, θ ) for DP ) satisfies 30) Proof of Proposition 7: Assume c A t y 0 relintr, otherwise from Proposition 1, the point ȳ, ū) = y 0, uc A t y 0 )) satisfies the assertion of the proposition We consider problem DP ), defined by 29), with y 0 and s 0 relintr such that c A t y 0 s 0 distc A t y 0, R )+ε From Lemma 5 we have that there exists a point y, δ, s, θ) feasible for DP ) that satisfies Define θ c A t y 0 s 0 ȳ = y + θy0 δ + θ distc A t y 0, R ) + ε and s = s + θs0 δ + θ By construction we have ȳ C Y, c A t ȳ = s relintr Therefore from Proposition 1 ȳ, uc A t ȳ)) Y d, and letting ξ = max{1, y 0 } we have ȳ y 0 = y δy0 δ + θ y + δ)ξ θ distc At y 0, R ) + ε ξ Proof of Theorem 9: Note that > 0 implies Y d ; note also that is finite, otherwise Proposition 2 shows that R = {0} which is a subspace Set s 0 R such that s 0 = A and τ R = dists0,rel R ) s 0 We also assume for now that c A t y 0 s 0 We show later in the proof how to handle the case when c A t y 0 = s 0 Denote r y 0 = disty 0, rel CY ), and r s 0 = dists 0, rel R ) = τ R A > 0 With the points y 0 and s 0, use Lemma 5 to obtain a point y, δ, s, θ) feasible for DP ) such that from inequality 30) satisfies 0 < 1 θ c At y 0 + A 31) 28

Define the following: ȳ = y + θy0 δ + θ, s + θs0 s = δ + θ, r ȳ = θr y0 δ + θ, r s = θr s0 δ + θ By construction distȳ, rel CY ) rȳ, dist s, rel R ) r s, and c A t ȳ = s Therefore, from Proposition 1 the point ȳ, u s)) Y d We now choose ū so that ȳ, ū) Y d and bound its distance to the relative boundary Since relint R effdom u ), from Proposition 11 and Proposition 12, we have that for any ε > 0, the ball B s, ) r s LR 1+ε relint effdom u ) Define the function µ, ) by µ s, κ) := 1 A sup s us) s s κ s R Note that µ, ) is finite for every s relint effdom u ) and κ [0, dist s, rel R )), because it is defined as the supremum of the continuous function u ) over a closed and bounded subset contained in the relative interior of its effective domain, see Theorem ), and since ū r s A 1+ε) 101 of [18] We define ū = µ s, r s + u s) u s) the 1+ε point ȳ, ū) Y d Let us now bound distȳ, ū), rel Y d ) Consider any vector v L C Y {y A t y L R } such that v 1, then ȳ + αv C Y for any α rȳ, and c A t ȳ + αv) = s + α A t v) B ) r s s, L R 1 + ε for any α r s A 1 + ε) This last inclusion implies that c A t ȳ + αv), ū + β) = s + α A t v), ū + β) C r s for any α, β We have shown that distȳ, A 1+ε) rel C Y ) rȳ and distc A t ȳ, ū), rel C ) Therefore item 3 of Lemma 4 implies r s A 1+ε) distȳ, ū), rel Y d ) = { } 1 max{1, A } min r s rȳ, A 1 + ε) { } 1 θ 1 + ε) max{1, A } δ + θ min r s 0 r y 0, A θ min{r y 0, τ R } 1 + ε) max{1, A }δ + θ) To finish the proof, we bound the different expressions in the statement of the theorem; let ξ = max{1, A } to simplify notation Here we use inequality 31): 29

1 a) ȳ y 0 = y δy0 δ + θ max{1, y0 } θ c At y 0 + A b) ȳ 1 θ y + y 0 1 θ + y0 y 0 + c At y 0 + A 2 a) b) 1 distȳ, rel C Y ) 1 rȳ 1 distȳ, ū), rel Y d ) = δ + θ θr y 0 max{1, y 0 } 1 1 + 1 ) 1 1 + c At y 0 ) + A r y 0 θ r y 0 ) 1 + ε)ξ δ + θ min{r y 0, τ R } θ 1 + ε)ξ min{r y 0, τ R } 1 + ε)ξ min{r y 0, τ R } 1 + c At y 0 + A 1 + 1 θ ) 3 a) b) c) d) ȳ y 0 distȳ, rel CY ) y δy0 θr y 0 1 r y 0 1 r y 0θ max{1, y0 } c A t y 0 + A ȳ y 0 distȳ, ū), rel Y d ) 1 + ε)ξ y δy0 θ min{r y 0, τ R } 1 + ε)ξ ȳ distȳ, rel CY ) y + θy0 θr y 0 1 r y 0 max{1, y 0 } 1 + ε)ξ 1 min{r y 0, τ R } θ max{1, y0 } c A t y 0 + A max{1, y 0 } min{r y 0, τ R } 1 y 0 + 1 ) r y 0 θ y 0 + c At y 0 + A ȳ distȳ, ū), rel Y d ) y + θy0 1 + ε)ξ θ min{r y 0, τ R } 1 + ε)ξ min{r y 0, τ R } ) 1 + ε)ξ min{r y 0, τ R } y 0 + 1 ) θ ) y 0 + c At y 0 + A For the case c A t y 0 = s 0, define ȳ = y 0 and ū = µ ) s 0, τ R A 1+ε The proof then proceeds exactly as above, except that now we show that distc A t ȳ, ū), rel C ) τ R, which implies that distȳ, ū), rel Y 1 1+ε d) min{τ max{1, A }1+ε) R, r y0} from item 3 of Lemma 4 This inequality is then used to prove each item in the theorem Finally, we note that Theorem 10 can be proved using almost identical arguments as in the proof of Theorem 9, but with a careful analysis to handle the special cases when R or C Y are subspaces, see [12] for the exact details 30