Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures

Similar documents
Some preconditioners for systems of linear inequalities

A data-independent distance to infeasibility for linear conic systems

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Applied Mathematics Letters. Comparison theorems for a subclass of proper splittings of matrices

The Drazin inverses of products and differences of orthogonal projections

DS-GA 1002 Lecture notes 10 November 23, Linear models

Multiplicative Perturbation Bounds of the Group Inverse and Oblique Projection

On the projection onto a finitely generated cone

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

A revisit to a reverse-order law for generalized inverses of a matrix product and its variations

Uniqueness of the Solutions of Some Completion Problems

A note on parallel and alternating time

MOORE-PENROSE INVERSE IN AN INDEFINITE INNER PRODUCT SPACE

Yimin Wei a,b,,1, Xiezhang Li c,2, Fanbin Bu d, Fuzhen Zhang e. Abstract

ELA THE MINIMUM-NORM LEAST-SQUARES SOLUTION OF A LINEAR SYSTEM AND SYMMETRIC RANK-ONE UPDATES

HYPO-EP OPERATORS 1. (Received 21 May 2013; after final revision 29 November 2014; accepted 7 October 2015)

NORMS ON SPACE OF MATRICES

Generalized Principal Pivot Transform

Moore Penrose inverses and commuting elements of C -algebras

Conditioning of linear-quadratic two-stage stochastic programming problems

Elementary operation matrices: row addition

On the simplest expression of the perturbed Moore Penrose metric generalized inverse

Department of Aerospace Engineering AE602 Mathematics for Aerospace Engineers Assignment No. 4

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

c i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].

A note on solutions of linear systems

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

ECE 275A Homework #3 Solutions

Linear Algebra and its Applications

Def. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B =

The best generalised inverse of the linear operator in normed linear space

Some results on the reverse order law in rings with involution

The DMP Inverse for Rectangular Matrices

Operators with Compatible Ranges

x 1 x 2. x 1, x 2,..., x n R. x n

Relationships between upper exhausters and the basic subdifferential in variational analysis

Lecture notes: Applied linear algebra Part 1. Version 2

MTH 2032 SemesterII

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

MATH36001 Generalized Inverses and the SVD 2015

Optimization problems on the rank and inertia of the Hermitian matrix expression A BX (BX) with applications

Eighth Homework Solutions

Math Linear Algebra II. 1. Inner Products and Norms

Tikhonov Regularization of Large Symmetric Problems

Solutions to Exam I MATH 304, section 6

ELA THE OPTIMAL PERTURBATION BOUNDS FOR THE WEIGHTED MOORE-PENROSE INVERSE. 1. Introduction. Let C m n be the set of complex m n matrices and C m n

Systems of Linear Equations

The skew-symmetric orthogonal solutions of the matrix equation AX = B

Elementary Linear Algebra Review for Exam 2 Exam is Monday, November 16th.

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Chapter 2 Subspaces of R n and Their Dimensions

arxiv: v1 [math.ra] 14 Apr 2018

MAT Linear Algebra Collection of sample exams

A Note on Solutions of the Matrix Equation AXB = C

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

APPLICATIONS OF THE HYPER-POWER METHOD FOR COMPUTING MATRIX PRODUCTS

Recall the convention that, for us, all vectors are column vectors.

Operators with numerical range in a closed halfplane

Lecture 1: Review of linear algebra

Robust linear optimization under general norms

Linear Algebra Massoud Malek

Subset selection for matrices

Linear Algebra and its Applications

PRODUCT OF OPERATORS AND NUMERICAL RANGE

MATH 167: APPLIED LINEAR ALGEBRA Chapter 2

Research Division. Computer and Automation Institute, Hungarian Academy of Sciences. H-1518 Budapest, P.O.Box 63. Ujvári, M. WP August, 2007

SPECIAL FORMS OF GENERALIZED INVERSES OF ROW BLOCK MATRICES YONGGE TIAN

First we introduce the sets that are going to serve as the generalizations of the scalars.

On EP elements, normal elements and partial isometries in rings with involution

. The following is a 3 3 orthogonal matrix: 2/3 1/3 2/3 2/3 2/3 1/3 1/3 2/3 2/3

~ g-inverses are indeed an integral part of linear algebra and should be treated as such even at an elementary level.

Math 54 HW 4 solutions

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

On V-orthogonal projectors associated with a semi-norm

EXPLICIT SOLUTION OF THE OPERATOR EQUATION A X + X A = B

On the Relative Strength of Split, Triangle and Quadrilateral Cuts

Where is matrix multiplication locally open?

Subsequences of frames

A Note on the Group Inverses of Block Matrices Over Rings

Linear algebra. S. Richard

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

On the Moore-Penrose Inverse in C -algebras

Spectral analysis for rank one perturbations of diagonal operators in non-archimedean Hilbert space

Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms

Nonlinear Analysis 71 (2009) Contents lists available at ScienceDirect. Nonlinear Analysis. journal homepage:

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

CHAPTER 8. Smoothing operators

October 25, 2013 INNER PRODUCT SPACES

1. Introduction. Consider the following parameterized optimization problem:

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

The SVD-Fundamental Theorem of Linear Algebra

LEAST SQUARES SOLUTION TRICKS

Formulas for the Drazin Inverse of Matrices over Skew Fields

Applied Mathematics Letters

Pseudoinverse & Moore-Penrose Conditions

Linear algebra review

We could express the left side as a sum of vectors and obtain the Vector Form of a Linear System: a 12 a x n. a m2

Transcription:

Journal of Complexity 26 (200) 209 226 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco On strata of degenerate polyhedral cones, II: Relations between condition measures Dennis Cheung a, Felipe Cucker b,, Javier Peña c a United International College, Tang Jia Wan, Zhuhai, Guandong Province, PR China b Department of Mathematics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong c Tepper School of Business, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 523-3890, USA a r t i c l e i n f o a b s t r a c t Article history: Received 2 January 2009 Accepted 27 October 2009 Available online 0 November 2009 Keywords: Linear programg Complementarity problems Condition numbers In a paper Cheung, Cucker and Peña (in press) [5] that can be seen as the first part of this one, we extended the well-known condition numbers for polyhedral conic systems C(A) Renegar (994, 995) [7 9] and C(A) Cheung and Cucker (200) [3] to versions C(A) and C(A) that are finite for all input matrices A R n m. In this paper we compare C(A) and C(A) with other condition measures for the same problem that are also always finite. 2009 Elsevier Inc. All rights reserved.. Introduction Consider the problem of, given a matrix A R n m (with n m), deciding whether the system Ay 0 has non-zero solutions. The set K(A) of solutions of such a system is a closed pointed polyhedral cone in R m. Let d(a) = dim K(A) be its dimension. When d(a) {, 2,..., m }, arbitrary small perturbations à of the data A can turn the dimension d(ã) of the resulting cone to be zero and hence, can change the output of the problem above from Yes to No. To analyze both the complexity and the accuracy (under finite precision arithmetic) of a number of algorithms solving our problem, Renegar [7 9] defined a condition number C(A) as the reciprocal of the normalized distance from A to the set Σ of ill-posed inputs. This set Σ consists precisely of those matrices A for which d(a) {, 2,..., m }. A related condition number, denoted C (A), was introduced in [3]. Roughly speaking, C(A) is defined in terms of the geometry of the space R n m of data and C (A) in terms of the geometry in the space R m of solutions (e.g., C (A) is the opening of K(A) when d(a) = m). The main result in [3] Corresponding author. E-mail addresses: dennisc@uic.edu.hk (D. Cheung), macucker@cityu.edu.hk (F. Cucker), jfp@andrew.cmu.edu (J. Peña). 0885-064X/$ see front matter 2009 Elsevier Inc. All rights reserved. doi:0.06/j.jco.2009..00

20 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 though, characterizes C (A) in terms of a kind of column-wise normalized distance to ill-posedness. In particular, both C(A) and C (A) are infinite when (and only when) A Σ. Independently of the above, other condition measures where developed which, in contrast with C(A) and C (A), are finite for all data A. A notable example is σ (A), introduced by Ye in [0]. Such condition measures have been used for the complexity analysis of infinite precision algorithms and yield sharper complexity bounds in the sense that they yield bounds that are always finite. Recently, in [5], we extended the condition numbers C(A) and C (A) to versions C(A) and C (A) that coincide with the former on R n m \ Σ but are finite on Σ. To do so, the basic idea was to stratify the set Σ in strata that share similar cones of solutions K(A). This idea is not new; in some sense, the passage, say, from C(A) to C(A) mimics the passage from the classical condition number κ(m) for the computation of the inverse M of a square matrix M R n n to κ Ď (M), for the computation of its Moore Penrose inverse. While κ(m) = when M is not invertible, κ Ď (M) < for all M R n n. And for non-zero matrices M, κ Ď (M) can be characterized as the normalized distance from M to the set of matrices having rank less than rank (M). For a detail discussion on these properties of the Moore Penrose inverse, see [,2,6]. A natural question arising in front of this collection of finite condition numbers is whether one can bound any of them in terms of the others. If possible, one would like to do so by multiplying by a scaling factor that depends on the dimensions m and n and maybe of some other feature of A. The main goal of this paper is to do this. 2. Basic definitions and main results 2.. Some known condition measures Let A be any matrix in R n m which, in the rest of this paper we assume has no zero row. Let P = {x R n : A T x = 0, x 0, x = } and D = {s R n : y R m, s = Ay, s 0, s = }. It is known [4] that there exists a unique partition P(A) = (B, N) of {,..., n} for which there exists x R n and y R m satisfying A T B x B = 0, x B > 0, A N y > 0, A B y = 0. () In this equation A B is the matrix obtained from A by deleting the columns that are not in B. The matrix A N and vector x B are similarly defined. Note that P = iff B =. Similarly, D = iff N =. σ (A) define Ye [0] defined the condition measure σ (A) as follows. Define σ P (A) = if B =. Otherwise, σ P (A) := max x j. x P Similarly define σ D (A) = if N =. Otherwise, define σ D (A) := max s j. j N s D Finally, define σ (A) = {σ P (A), σ D (A)}. C(A) when N, Assume a norm in Lin(R m, R n ) (inducing norms in Lin(R m, R B ) and Lin(R m, R N )). Define, ρ N = P(Ã) P(A) Ã B =A B Ã A and, when B, let L = kernel(a B ) denote the kernel of A B and ρ B = P(Ã) P(A) Ã N =A N kernel(ã B ) L Ã A.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 2 If either N or B is empty we let the corresponding ρ be infinity. Finally, define { AB C(A) = max ρ B (A), A } N ρ N (A) where, by convention, A = 0. C (A) Fix a norm in R m. Let L = kernel(a B ) R m denote the kernel of A B and L = range(a T B ) R m the range of A T B. If N define a j y v N = max y L j N a j y. y 0 Here a j denotes the jth row of A and the norm in Lin(R m, R) dual to. Notice that the definition of P(A) = (B, N) guarantees that L {0} when N. If B define a j y v B = max y L a j y. y 0 Notice that L {0} when B because the rows of A are assumed to be non-zero. By convention, we let v N (A) = + when N(A) = and v B (A) = when B(A) =. If N(A) then v N (A) > 0. If B(A) then v B (A) < 0. We define v(a) := {v N, v B } and C (A) := v(a). κ Ď (A) Recall (but see [,2] for detailed treatments), that the pseudo-inverse or Moore Penrose inverse of A R n m is the only matrix A Ď R m n satisfying the following equations AXA = A, XAX = X, (AX) T = AX, and (XA) T = XA. (2) Assume norms a in Lin(R m, R n ) and b in Lin(R n, R m ). For a matrix A R n m we define κ Ď (A) := A a A Ď b. This condition number is a natural extension of Turing s condition number for inversion of square matrices to Moore Penrose inversion. In the case when a and b are operator norms, the condition number κ Ď (A) is related to the distance to rank dropping. More precisely, assume R m and R n are respectively endowed with norms p and s for q, s {, 2,..., }. The operator norms qs and sq are defined as follows. For A R n m A qs = max Ay s, and A Ď sq = max A Ď x q. y q = x s = If rank (A) = r and Σ r = {B R n m rank (B) < r} then [6, Section 2.5.4 and 5.5.4] A Ď sq = inf{ A Ã qs : rank (Ã) < r}. 2.2. Two auxiliary condition measures The following two condition measures, to the best of our knowledge, have not occurred in the literature. We introduce them since they appear to be closely related to σ (A), C(A), and C (A) and simplify the comparison between them.

22 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 ν(a) This measure is similar to σ (A). Define ν P (A) = if B =. Otherwise, define ν P (A) := max x P x j. Similarly, define ν D (A) = if N =. Otherwise, define ν D (A) := max s D j N s j. Finally, define ν(a) = {ν P (A), ν D (A)}. Note that the points x, y in () guarantee that ν(a) > 0. Θ(A) This measure is similar to ρ P (A) and ρ N (A). Let kernel(a T ) = {x R n : A T x = 0} and range(a) = {Ay : y R m } be the null space of A T and the range space of A respectively. In addition, let R n ++ = {x Rn : x > 0} and, for N, range N (A) = {A N y : A B y = 0}. For B, N, P(A) = (B, N) iff kernel(a T B ) RB ++ and range N(A) R N ++. In what follows, also for B, N, denote k = dim kernel(a T B ) and r = dim range N(A). For l s, recall, the Grassmannian G s l is the set of linear subspaces of Rs with dimension l. Fix norms p and q in R s. We define a distance dist pq in G s l by x x q dist pq (L, L) := max. 0 x L x L x p Note that in general dist pq (L, L) distpq ( L, L) since the roles of L and L in the definition of distpq (L, L) are not symmetric. Define Θ P pq (A) = if B =. Otherwise, define Θ P pq (A) = L G B k L R B ++ = dist pq (kernel(a T B ), L). Similarly define Θ D pq (A) = if N =. Otherwise, define Θ D pq (A) = L G N r L R N ++ = dist pq (range N (A), L). Finally, define Θ pq (A) = {Θ P pq (A), Θ D pq (A)}. 2.3. The main results The six condition measures above are actually six families of condition measures. Indeed, each of them depends of a choice of norms for some of the spaces R m, R n, Lin(R m, R n ), and Lin(R n, R m ) as shown in the table below Measure R m R n Lin(R m, R n ) Lin(R n, R m ) σ (A) * C(A) * C (A) * ν(a) * Θ pq (A) ** κ Ď (A) * * where a dash means no norm is needed, a star * means a norm needs to be specified and the two stars ** refer to the norms p and q in R n.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 23 To state our main results, Theorems 4, specific choices of the norms above need to be made. Similar results for other choices of norms are straightforward by using well-known bounds for norm comparisons. The norm in R n corresponding to σ (A) and ν(a) appears in the definition of P and D and, to follow the original definition of Ye, we took them to be the -norm in Section 2. and in our main results. The norms corresponding to C(A), C (A), Θ pq (A), and κ Ď (A) are specified in the statements of Theorems 4. In the case of Θ pq (A) this is done with the subindex pq. When p = q = 2, however, we will eliate the 22 and write Θ(A) (as well as dist(l, L), Θ P (A) and Θ D (A)). Theorem. For any matrix A R m n, ν(a) = Θ (A). Theorem 2. For any matrix A R m n, ν(a) σ (A) nν(a). Theorem 3. Consider Lin(R m, R n ) and Lin(R n, R m ) endowed with the operator norm associated with the 2-norm in both R m and R n. For any matrix A R m n, C(A) max{κ Ď (A B ), κ Ď (A N )} Θ(A) C(A). Theorem 4. For any norm Y in R m, the norm Y in Lin(R m, R n ), and any matrix A R m n, A B A N C (A) C(A) max a j, Y a j C (A) A Y a j C (A). Y j N j n Here Y denotes the norm in Lin(Rm, R) dual to Y. The previous four theorems yield relationships among any two of the four measures σ (A), C(A), C (A), ν(a), Θ pq (A) for any choice of norms via suitable norm comparisons. The following corollary states one of the possible sets of relationships. Corollary. Assume R m and R n are endowed with the -norm and -norm respectively, and Lin(R m, R n ), Lin(R n, R m ) are endowed with the associated operator norms. Then a j j n mn A C (A) mn C(A) Θ (A), Θ (A) = ν(a) σ (A) nν(a) = nθ (A), and Θ (A) max{κ Ď (A B ), κ Ď (A N )} mn C(A) max{κ Ď (A B ), κ Ď (A N )} mn C (A). Proof. This follows by putting together Theorem through Theorem 4 and the fact that for A Lin(R m, R n ) A 22 mn A A 22 A mn A 22.

24 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 3. Proof of Theorem Lemma. For any matrix A R m n, ν P (A) Θ P (A). Proof. If B =, then ν P (A) = Θ P (A) = and the statement holds. In the following we consider B. Let L G B k such that dist (kernel(a T B ), L) < ν P (A). (3) We will show that L R B ++. To that end, let x be any vector in P such that ν P (A) = max x P x j = x j. (4) Since x P, from the uniqueness of the partition P(A) = (B, N) and () it follows that x N = 0. Hence x B = x = and x B kernel(a T B ). By the definition of dist, there exists x L such that x x B x B dist (kernel(a T B ), L). (5) By (3) (5), and using that x B = x = we have x x B = x x B x B dist (kernel(a T ), L) B < ν P (A) = x j. Therefore, for j B, x j = x j + x j x j x j x j x j x j x x B > 0. That is x > 0, and hence L R B ++ Θ P (A).. We conclude, by the definition of Θ P, that νp (A) Lemma 2. For any matrix A R m n, ν P (A) Θ P (A). Proof. We can assume again that B as otherwise the statement trivially holds. We will construct L G B k such that dist (kernel(a T B ), L) νp (A) and L R B ++ =. Let x be any vector in P such that x j = max x P x j = ν P (A). (6) We already remarked that x N = 0 and hence, x B = x = and x B kernel(a T B ). Denote by e the vector (,..., ) R B and e = {v R B e T v = 0}. Since dim kernel(a T) = B k and dim e = B we have dim(e kernel(a T)) B k. Let d,..., d k be linearly independent vectors in this space and D R B (k ) the matrix [d,..., d k ]. Then D has full column rank, e T D = 0, and kernel(a T B ) {Dy : y Rk }. (7) We claim that the matrix [x B, D] has full column rank. To see this, assume y 0 R and y R k are such that x B y 0 + Dy = 0. (8)

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 25 Since e T D = 0, y 0 (e T x B ) = y 0 (e T x B ) + e T Dy = 0. (9) Also, since ν(a) > 0, by (6), x B > 0. This implies y 0 = 0, which in turn, by (8), implies Dy = 0. Since D has full column rank, y = 0. We have thus shown that Eq. (8) has no non-trivial solution, i.e., [x B, D] has full column rank. Therefore, dim{x B y 0 + Dy : y 0 R, y R k } = k and consequently kernel(a T B ) = {x By 0 + Dy : y 0 R, y R k }. Let x = x B ν P (A)e and L = {x y 0 + Dy : y 0 R, y R k }. Let 0 x kernel(a T B ). There exist y 0 R and y R k such that x = x B y 0 + Dy. In addition, since e T D = 0, x e T x = e T (x B y 0 + Dy) = (e T x B )y 0 = y 0. (0) Let x = x y 0 + Dy. Then, x L. Moreover, by the definition of x, x x = x y 0 x B y 0 = x x B y 0 = ν P (A) y 0. () Combining (0) and () when y 0 0 we obtain x x νp (A)y 0 = ν P (A), x y 0 an inequality that trivially holds when y 0 = 0. Therefore, by the definition of dist, dist (kernel(a T B ), L) νp (A). (2) To finish, we next show that L R B ++ =. Assume to the contrary that there exists x L satisfying x > 0. Since x L, there exists ŷ 0 R and ŷ R k such that x = x ŷ 0 + D ŷ. (3) Therefore, x = e T x = e T (x ŷ 0 + D ŷ) = e T x ŷ 0 = x ŷ 0 which implies ŷ 0 > 0. Define x R n by taking x N = 0 and x B = x Bŷ 0 + D ŷ. (4) Using x > 0 along with (3), (4), and the definition of x, x B = x + (x B x )ŷ 0 > (x B x )ŷ 0 = v P (A)eŷ 0. (5) Since both ŷ 0, v P (A) > 0 we deduce x > B 0. In addition, using (4) and the equality et D = 0, x B = e T x = B et (x B ŷ 0 + D ŷ) = e T x B ŷ 0 = ŷ 0. (6) Combining Eqs. (5) and (6), we obtain However, x x B x j x B > νp (A)y 0 y 0 = ν P (A). (7) = x x P. (8)

26 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Eqs. (7) and (8) contradict the definition of ν P (A). We thus conclude that L R B ++ consequently, by the definition of Θ and (2), Θ P (A) dist (kernel(a T ), L) B νp (A). = and, Proof of Theorem. Combining Lemmas and 2, we have, for any matrix A R n m, Θ P (A) = νp (A). (9) Let D be any matrix such that kernel(d T ) = range(a). Recall that P(A) = (B, N). Then P(D) = (N, B), range N (A) = kernel(d T ), N and ν D (A) = ν P (D). By (9) applied to D, Θ P (D) = νp (D). (20) (2) (22) Let r := dim range N (A) = dim kernel(d T N ). Combining equalities (2) and (22), ν D (A) = Θ P (D) = dist (kernel(d T ), L) N L G N r L R N ++ = by the definition of Θ P = dist (range N (A), L) by (20) L G N r L R N ++ = = Θ D (A) by the definition of Θ D. Notice that in the expression L G N r of A and not to the one of D. We conclude that in the second step above, the superscript N refers to the partition ν(a) = {ν P (A), ν D (A)} = {Θ P (A), Θ D (A)} = Θ (A). 4. Proof of Theorem 2 Lemma 3. For any matrix A R n m, ν(a) σ (A). Proof. Assume B, in particular P. Let j be any index in B such that max x = j max x j = σ P (A). x P x P Let x be any vector in P such that x j = max x P x j = ν P (A). Using these equalities it follows that x j σ P (A) and x j ν P (A) and hence, that σ P (A) ν P (A). Now assume B =. Then, ν P (A) = σ P (A) = and hence σ P (A) ν P (A) as well. Similarly, one proves σ D (A) ν D (A) and, hence, the statement.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 27 Lemma 4. For any matrix A R n m, σ (A) nν(a). Proof. For j B, let x (j) be any vector in P such that x (j) j = max x P x j. Then, σ P (A) = x (j) j. Let x = x (j). Since x (j) P, x (j) > 0 for all j B. Therefore, x j = x (k) j x (j) j = σ P (A). (23) k B Furthermore, x (j) = for all j B. Therefore, x = x (j) x (j) = n. By its definition, x kernel(a T ) R n x ++. It follows that x P. Hence, by (23), ν P (A) = max x P Similarly, one can prove ν D (A) σ D (A) n x j and we conclude that ν(a) σ (A) n. x j σ P (A) x n Theorem 2 now follows from Lemmas 3 and 4.. 5. Proof of Theorem 3 The following lemma shows that in the case p = q = 2 the function dist is indeed a distance. Since we have not been able to find a proof in the literature, we give one in Appendix. Lemma 5. For any L, L G n m, (i) dist(l, L) = x dist( L, L ) = ma x L x s, and s L (ii) dist(l, L) = dist( L, L). Lemma 6. For any matrix A R n m with B(A), ρ B (A) A B Θ P (A).

28 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Proof. Let L G B k be such that dist(kernel(a T B ), L) < ρ B(A) A B. We will show that L R B ++. Let L be the orthogonal complement of L in R B. By Lemma 5, dist(range(a B ), L ) < ρ B(A) A B. (24) Let à B be any matrix in R B m such that range(ã B ) = L and let Ã Ď B in Rm B be the pseudo-inverse of à B. It is known that, for any s R B, Ã Ď B s is the least-squares solution of the system à B y = s, i.e., à B (Ã Ď Bs) s = à B y s. y R m Let y be any vector in R m and substitute s by A B y in the equality above. We obtain à B (Ã Ď B A By ) A B y = y R m à B y A B y which implies or yet or yet à B (Ã Ď B A By ) A B y A B y Ď = s A B y s range(ã B ) s A B y = s range(ã B ) A B y à B (ÃB max A By ) A B y = y R m A B y max y R m Ď (à B ÃB max A B A B )y y R m A B y max s range(ã B ) s range(a B ) s range(ã B ) s A B y A B y s s s. Since this inequality holds for any y R m, by the definition of operator norm, (à B Ã Ď B A B A B ) A B the last by Eq. (24). This implies à B Ã Ď B A B A B < ρ B (A). dist(range(a B ), range(ã B )) < ρ B(A) A B Let A B = à B Ã Ď B A B and A N = A N. Then, kernel(a B ) kernel(a B ) and A A = à B Ã Ď B A B A B < ρ B (A). By the definition of ρ B (A), P(Ã) = P(A). Hence, there exists x B R B such that x B > 0 and A T B x B = 0. It follows that 0 = A T B x B = A T BÃT Ď B ÃT B x B = A T BÃBÃĎ B x B the last step holds by (2). Let x B = à B Ã Ď B x B. Clearly, x B range(ã B ). In addition, the equality above shows that x B kernel(a T B ).

Assume x B 0. Then, by Lemma 5(i), dist(range(a B ), range(ã B )) = D. Cheung et al. / Journal of Complexity 26 (200) 209 226 29 max x kernel(a T B ) x range(ã B ) x T x x x Combining this inequality with Eq. (24) we obtain ρ B(A) A B Therefore, x B = à B Ã Ď B x B = 0. x T B x B x B x B =. >. This contradicts the definition of ρ B(A). Note that x B = à B Ã Ď B x B is the orthogonal projection of x B onto range(ã B ). The only possibility for this projection to be 0 is that x B is in kernel(ã T B ), that is, since range(ã B ) = L, that x B L. But x B > 0 and hence L R B ++. We have thus proved that for all L G B k with dist(kernel(a B), L) < ρ B(A) A B we have L RB ++. By the definition of Θ P (A), this implies ρ B(A) A B Θ P (A). Lemma 7. For any matrix A R n m and s range(a), there exists w R n such that s = AA T w. Proof. By hypothesis, there exists y R m such that s = Ay. Let y R range(a T ) and y K such that y = y R + y K. Then, kernel(a) s = Ay = A(y R + y K ) = Ay R + Ay K = Ay R the last since y K kernel(a). Now use that y R range(a T ) to deduce the existence of w R n such that y R = A T w and conclude that s = AA T w. Lemma 8. For any matrix A R n m with B(A), Θ P (A) ρ B (A) A Ď B = κ Ď (A B ) ρ B(A) A B. Proof. Let à R n m be such that à N = A N, range(ã T B ) range(at B ) and à B A B < Θ P (A) A Ď B We will show that P(Ã) = (B, N) = P(A). For w R B A Ď B. (25) à B A T B w A BA T B w = (à B A B )A T B w à B A B A T B w which implies à B A T B w A BA T B w A B A T B w < Θ P (A) A T B w A Ď B < Θ P T (A) A w B A Ď B A BA Tw Θ P T (A) A w B B A Ď B A BA Tw B = Θ P (A) A Tw B (A Ď B A by (2) B) T A T Bw = Θ P (A) A Tw B A T B ATĎ B ATw = Θ P T (A) A w B A Tw by (2) B B = Θ P (A).

220 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Thus, by Lemma 7, By Lemma 5 dist(range(a B ), range(ã B )) = max s range(a B ) s range(ã B ) = max w R B s range(ã B ) s s s s A B A T B w A B A T B w T à B A max w B A BA Tw B w R B A B A Tw B < Θ P (A). dist(kernel(a T B ), kernel(ãt B )) = dist(range(a B), range(ã B )) < Θ P (A). (26) Since range(ã T B ) range(at B ), rank (à B ) rank (A B ). On the other hand, it is known that A Ď = rank (A )<rank (A) A A. Thus, from (25) it follows that rank (à B ) rank (A B ). Therefore, rank (à B ) = rank (A B ). So, kernel(ã T B ) GB k and, by (26) and the definition of Θ P, kernel(ã T B ) RB ++. Thus P(Ã) = (B, N). From the definition of ρ B we finally obtain Θ P (A) A Ď B ρ B(A). The following Proposition immediately follows from Lemmas 6 and 8. Proposition. For any matrix A R n m with B(A), ρ B (A) A B Θ P (A) ρ B (A) A Ď = κ Ď B (A B ) ρ B(A) A B. We next proceed with the case N(A). Lemma 9. For any matrix A R n m with N(A), ρ N A N Θ D (A). Proof. Let L G N r such that dist(range N (A), L) < ρ N A N. (27) We will show that L R N ++. Let à N be any matrix in R N m such that range(ã N ) = L. As in Lemma 6 we have à N (Ã Ď Ns) s = à N y s. (28) y R m Let y be any vector in R m, h = dim(kernel(a B )), and Z any matrix in R m h such that the columns of Z form an orthonormal basis for kernel(a B ), i.e., range(z) = kernel(a B ) and Z T Z = I. Substituting s by A N Zy into Eq. (28) we obtain à N (Ã Ď N A NZy ) A N Zy = y R m à N y A N Zy = s A N Zy s range(ã N )

and reasoning as in Lemma 6 we obtain Ď Ã N ÃN max A NZ A N Zy y R m A N Z y max s range N (A) By the definitions of operator norm and dist, à N Ã Ď N A NZ A N Z A N Z the last by Eq. (27). Since Z =, D. Cheung et al. / Journal of Complexity 26 (200) 209 226 22 s range(ã N ) dist(range N (A), range(ã N )) < s s s. ρ N A N à N Ã Ď N A NZ A N Z < ρ N. (29) Let A N = à N Ã Ď N A N and A B = A B. Then, multiplying by ZZ T = I, A A = A N A N = (à N Ã Ď N A NZ A N Z)Z T (à N Ã Ď N A NZ A N Z) Z T = (à N Ã Ď N A NZ A N Z) < ρ N by Eq. (29). By the definition of ρ N, P(A) = P(A). Therefore, there exists y R m such that A N y > 0 and A B y = 0. Since y kernel(a B ) = range(z), there exists y R h such that y = Zy. In addition, by the definition of A N, A N y = à N Ã Ď N A NZy. Since A N y > 0, range(ã N ) R N ++. Also, since L = range(ã N ), L R N ++. We have thus proved that for all L G N r with dist(range N (A), L) < ρ N A N we have L RN ++. By the definition of Θ D (A), this implies ρ N A N Θ D (A). Lemma 0. For any matrix A R n m with N(A), Θ D (A) ρ N A Ď N = κ Ď (A N ) ρ N A N. Proof. Let à B = A B and à N be any matrix in R N m such that à N A N < Θ D (A) A Ď N. (30) For y R m, à N y A N y = (à N A N )y à N A N y < Θ D (A) y which implies, using (2), à N y A N y A N y A Ď N < Θ D (A) y A Ď N A Ny Θ D (A) y A Ď N A Ny = Θ D (A) y A T N AĎTy. N Since A T N AĎT N y is the orthogonal projection of y on range(at N ), AT N AĎT y y. It follows that N à N y A N y A N y < Θ D (A). (3)

222 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Let Z be any matrix in R m h such that the columns of Z form an orthonormal basis for kernel(a B ). For y R h, using (30) and Z =, à N Zy A N Zy = (à N A N )Zy < Θ D (A) y which implies à N Zy A N Zy y It follows that < Θ D (A) A Ď N A Ď N. à N Zy A N Zy à N Z A N Z = max y R h y A Ď N < A Ď N. Using (2) it is easy to show that (A N Z) Ď = Z T A Ď N and, therefore, and hence (A N Z) Ď = Z T A Ď N Z T A Ď N = AĎ N à N Z A N Z < Now use that (A N Z) Ď. (A N Z) Ď = A A N Z rank (A )<rank (A N Z) to deduce that rank (à N Z) rank (A N Z). Which implies that r = dim range N (A) dim(range(a N Z)) dim range(ã N Z). Let L be any linear subspace in G N r dist(range N (A), L) = max such that L range(ã N Z). Then, s L s range N (A) s s s max s range(ã N Z) s range N (A) s s s = max A N y à N Zy y R m A B y=0 à N Zy = max A N y à N ỹ ỹ range(z) A B y=0 à N ỹ A N ỹ à N ỹ max ỹ kernel(a B ) à N ỹ < Θ D (A) by (3). since range(z) = kernel(a B ) By the definition of Θ D, L R N ++. And since L range(ã N Z), à N (range(z)) R N ++. In other words, there exists y R m such that à N y > 0 and à B y = 0. This shows P(A) = P(Ã). By the definition of ρ N, we finally obtain Θ D (A) A Ď N ρ N.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 223 Again, the following Proposition immediately follows from Lemmas 9 and 0. Proposition 2. For any matrix A R n m with N(A), ρ N (A) A N Θ D (A) ρ N (A) A Ď = κ Ď N (A N ) ρ N(A) A N. Theorem 3 now follows from Propositions and 2. 6. Proof of Theorem 4 The main result in [5] (take α j = for j =,..., n in Theorem therein), states that for any norm Y in R m and Lin(R m, R n ) endowed with Y one has, if N, Using that a j y ρ N (A) = max y L j N y. y 0 A N = max A N y = max y Y = j N it follows that as well as a j Y j N v N (A) = A N ρ N (A) A N = max y L y 0 a j Y j N A N max y Y = max y L j N y 0 a j y = max a j Y j N a j y a j Y y a j y max y L j N A N y = ρ N(A) A N y 0 j N a j y A N y = max y L y 0 j N a j y max y L j N a j y = v N(A). Y y 0 a j y max a l y Y l N Theorem in [5] (again with α j = for j =,..., n) also shows that, if B, ρ B (A) = max y L y 0 a j y y Y and reasoning as above it follows that a j Y v B (A) ρ B(A) A B A B v B(A). The conclusion of Theorem 4 is now immediate. Acknowledgments The second author was partially supported by CERG grant CityU 00707. The third author was partially supported by NSF grant CCF 0830533.

224 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Appendix Lemma. For any L, L G n m and x Rn, x x = max x L s L s. Proof. Let x be any vector in L such that x x = x x. x L It is known that (x x) L. Therefore, max s L s x T (x x) (x x) = (x x + x)t (x x) (x x) = (x x)t (x x) (x x) = (x x)t (x x) (x x) + xt (x x) (x x) since x L and (x x) L = (x x) = x x by (32). x L On the other hand, let s be any vector in L such that x T s s = max s L s. (32) (33) Since x L and s L, x T s = 0 (x x) T s = x T s x x s x T s Proof of Lemma 5(i). x x xt s s x x max x L s L s dist(l, L) = max x L x L = max x L = max max x L s L = max max s L x L = max s L x x x x max s L s s s L = max x L x s = max x L s L x s = max s L using (32) and (33). x x x x L by Lemma x s s max x L x s s by Lemma s s = max = dist( L, L ). s L s L s

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 225 Lemma 2. For any L, L G n m, if dist(l, L) <, then L L = {0}. In particular, L + L = R n. Proof. Suppose there exists x L L, x {0}. Then, by Lemma 5(i), dist(l, L) = max x L s L x s x T x x x = in contradiction with our hypothesis. So, L L = {0}. The second statement follows from the fact that L and L have dimensions m and n m, respectively. Proof of Lemma 5(ii). Let us consider the following two cases: (i) dist(l, L) and (ii) dist(l, L) <. In case (i), by the definition of dist, dist( L, L) = max x L x L x x x max x L x 0 x Let us consider case (ii). Let x be any vector in L such that x L x x x = dist(l, L). x x = max = dist( L, L). (34) x L x L x Since dist(l, L) <. Then, by Lemma 2, there exists x L and s L such that x = x + s. By (34), x x dist( L, L) = x L x x x x 2 x 2 x 2 x since x 2 x L = = x x x 2 x x 2 x 2 x s 2 x x 2 (x s) x x s 2 since x = x s ( x 2 + s 2 )x x 2 x + x 2 s = x ( x 2 + s 2 ) = s 2 x + x 2 s x ( x 2 + s 2 ) = s 4 x x ( x 2 + s 2 ) 2 + x 4 s 2 s = s 2 + x. 2 In addition, by Lemma 5(i), ( s) T x dist(l, L) x s = st (x s) x s s since x = x s s T s = x + s s since x s = s x + s s = since x s. s 2 + x 2 since x s since x s

226 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 In conclusion, for any L, L G n m (no matter whether or not dist(l, L) ), dist(l, L) dist( L, L). Similarly, one can show that dist(l, L) dist( L, L). We thus conclude that dist(l, L) = dist( L, L). References [] A. Ben-Israel, T.N.E. Greville, Generalized Inverses: Theory and Applications, 2nd edition, Springer-Verlag, 2003. [2] S.L. Campbell, C.D. Meyer, Generalized Inverses of Linear Transformations, Pitman, 979. [3] D. Cheung, F. Cucker, A new condition number for linear programg, Math. Program. 9 (200) 63 74. [4] D. Cheung, F. Cucker, J. Peña, Unifying condition numbers for linear programg, Math. Oper. Res. 28 (2003) 609 624. [5] D. Cheung, F. Cucker, J. Peña, On strata of degenerate polyhedral cones. I: Condition and distance to stratae, European J. Oper. Res. 98 (2009) 23 28. [6] G. Golub, C. Van Loan, Matrix Computations, 3rd edition, John Hopkins Univ. Press, 996. [7] J. Renegar, Some perturbation theory for linear programg, Math. Program. 65 (994) 73 9. [8] J. Renegar, Incorporating condition measures into the complexity theory of linear programg, SIAM J. Optim. 5 (995) 506 524. [9] J. Renegar, Linear programg, complexity theory and elementary functional analysis, Math. Program. 70 (995) 279 35. [0] Y. Ye, Toward probabilistic analysis of interior-point algorithms for linear programg, Math. Oper. Res. 9 (994) 38 52.