Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures

Journal of Complexity 26 (200) 209 226 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco On strata of degenerate polyhedral cones, II: Relations between condition measures Dennis Cheung a, Felipe Cucker b,, Javier Peña c a United International College, Tang Jia Wan, Zhuhai, Guandong Province, PR China b Department of Mathematics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong c Tepper School of Business, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 523-3890, USA a r t i c l e i n f o a b s t r a c t Article history: Received 2 January 2009 Accepted 27 October 2009 Available online 0 November 2009 Keywords: Linear programg Complementarity problems Condition numbers In a paper Cheung, Cucker and Peña (in press) [5] that can be seen as the first part of this one, we extended the well-known condition numbers for polyhedral conic systems C(A) Renegar (994, 995) [7 9] and C(A) Cheung and Cucker (200) [3] to versions C(A) and C(A) that are finite for all input matrices A R n m. In this paper we compare C(A) and C(A) with other condition measures for the same problem that are also always finite. 2009 Elsevier Inc. All rights reserved.. Introduction Consider the problem of, given a matrix A R n m (with n m), deciding whether the system Ay 0 has non-zero solutions. The set K(A) of solutions of such a system is a closed pointed polyhedral cone in R m. Let d(a) = dim K(A) be its dimension. When d(a) {, 2,..., m }, arbitrary small perturbations Ã of the data A can turn the dimension d(ã) of the resulting cone to be zero and hence, can change the output of the problem above from Yes to No. To analyze both the complexity and the accuracy (under finite precision arithmetic) of a number of algorithms solving our problem, Renegar [7 9] defined a condition number C(A) as the reciprocal of the normalized distance from A to the set Σ of ill-posed inputs. This set Σ consists precisely of those matrices A for which d(a) {, 2,..., m }. A related condition number, denoted C (A), was introduced in [3]. Roughly speaking, C(A) is defined in terms of the geometry of the space R n m of data and C (A) in terms of the geometry in the space R m of solutions (e.g., C (A) is the opening of K(A) when d(a) = m). The main result in [3] Corresponding author. E-mail addresses: dennisc@uic.edu.hk (D. Cheung), macucker@cityu.edu.hk (F. Cucker), jfp@andrew.cmu.edu (J. Peña). 0885-064X/$ see front matter 2009 Elsevier Inc. All rights reserved. doi:0.06/j.jco.2009..00

20 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 though, characterizes C (A) in terms of a kind of column-wise normalized distance to ill-posedness. In particular, both C(A) and C (A) are infinite when (and only when) A Σ. Independently of the above, other condition measures where developed which, in contrast with C(A) and C (A), are finite for all data A. A notable example is σ (A), introduced by Ye in [0]. Such condition measures have been used for the complexity analysis of infinite precision algorithms and yield sharper complexity bounds in the sense that they yield bounds that are always finite. Recently, in [5], we extended the condition numbers C(A) and C (A) to versions C(A) and C (A) that coincide with the former on R n m \ Σ but are finite on Σ. To do so, the basic idea was to stratify the set Σ in strata that share similar cones of solutions K(A). This idea is not new; in some sense, the passage, say, from C(A) to C(A) mimics the passage from the classical condition number κ(m) for the computation of the inverse M of a square matrix M R n n to κ Ď (M), for the computation of its Moore Penrose inverse. While κ(m) = when M is not invertible, κ Ď (M) < for all M R n n. And for non-zero matrices M, κ Ď (M) can be characterized as the normalized distance from M to the set of matrices having rank less than rank (M). For a detail discussion on these properties of the Moore Penrose inverse, see [,2,6]. A natural question arising in front of this collection of finite condition numbers is whether one can bound any of them in terms of the others. If possible, one would like to do so by multiplying by a scaling factor that depends on the dimensions m and n and maybe of some other feature of A. The main goal of this paper is to do this. 2. Basic definitions and main results 2.. Some known condition measures Let A be any matrix in R n m which, in the rest of this paper we assume has no zero row. Let P = {x R n : A T x = 0, x 0, x = } and D = {s R n : y R m, s = Ay, s 0, s = }. It is known [4] that there exists a unique partition P(A) = (B, N) of {,..., n} for which there exists x R n and y R m satisfying A T B x B = 0, x B > 0, A N y > 0, A B y = 0. () In this equation A B is the matrix obtained from A by deleting the columns that are not in B. The matrix A N and vector x B are similarly defined. Note that P = iff B =. Similarly, D = iff N =. σ (A) define Ye [0] defined the condition measure σ (A) as follows. Define σ P (A) = if B =. Otherwise, σ P (A) := max x j. x P Similarly define σ D (A) = if N =. Otherwise, define σ D (A) := max s j. j N s D Finally, define σ (A) = {σ P (A), σ D (A)}. C(A) when N, Assume a norm in Lin(R m, R n ) (inducing norms in Lin(R m, R B ) and Lin(R m, R N )). Define, ρ N = P(Ã) P(A) Ã B =A B Ã A and, when B, let L = kernel(a B ) denote the kernel of A B and ρ B = P(Ã) P(A) Ã N =A N kernel(ã B ) L Ã A.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 2 If either N or B is empty we let the corresponding ρ be infinity. Finally, define { AB C(A) = max ρ B (A), A } N ρ N (A) where, by convention, A = 0. C (A) Fix a norm in R m. Let L = kernel(a B ) R m denote the kernel of A B and L = range(a T B ) R m the range of A T B. If N define a j y v N = max y L j N a j y. y 0 Here a j denotes the jth row of A and the norm in Lin(R m, R) dual to. Notice that the definition of P(A) = (B, N) guarantees that L {0} when N. If B define a j y v B = max y L a j y. y 0 Notice that L {0} when B because the rows of A are assumed to be non-zero. By convention, we let v N (A) = + when N(A) = and v B (A) = when B(A) =. If N(A) then v N (A) > 0. If B(A) then v B (A) < 0. We define v(a) := {v N, v B } and C (A) := v(a). κ Ď (A) Recall (but see [,2] for detailed treatments), that the pseudo-inverse or Moore Penrose inverse of A R n m is the only matrix A Ď R m n satisfying the following equations AXA = A, XAX = X, (AX) T = AX, and (XA) T = XA. (2) Assume norms a in Lin(R m, R n ) and b in Lin(R n, R m ). For a matrix A R n m we define κ Ď (A) := A a A Ď b. This condition number is a natural extension of Turing s condition number for inversion of square matrices to Moore Penrose inversion. In the case when a and b are operator norms, the condition number κ Ď (A) is related to the distance to rank dropping. More precisely, assume R m and R n are respectively endowed with norms p and s for q, s {, 2,..., }. The operator norms qs and sq are defined as follows. For A R n m A qs = max Ay s, and A Ď sq = max A Ď x q. y q = x s = If rank (A) = r and Σ r = {B R n m rank (B) < r} then [6, Section 2.5.4 and 5.5.4] A Ď sq = inf{ A Ã qs : rank (Ã) < r}. 2.2. Two auxiliary condition measures The following two condition measures, to the best of our knowledge, have not occurred in the literature. We introduce them since they appear to be closely related to σ (A), C(A), and C (A) and simplify the comparison between them.

22 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 ν(a) This measure is similar to σ (A). Define ν P (A) = if B =. Otherwise, define ν P (A) := max x P x j. Similarly, define ν D (A) = if N =. Otherwise, define ν D (A) := max s D j N s j. Finally, define ν(a) = {ν P (A), ν D (A)}. Note that the points x, y in () guarantee that ν(a) > 0. Θ(A) This measure is similar to ρ P (A) and ρ N (A). Let kernel(a T ) = {x R n : A T x = 0} and range(a) = {Ay : y R m } be the null space of A T and the range space of A respectively. In addition, let R n ++ = {x Rn : x > 0} and, for N, range N (A) = {A N y : A B y = 0}. For B, N, P(A) = (B, N) iff kernel(a T B ) RB ++ and range N(A) R N ++. In what follows, also for B, N, denote k = dim kernel(a T B ) and r = dim range N(A). For l s, recall, the Grassmannian G s l is the set of linear subspaces of Rs with dimension l. Fix norms p and q in R s. We define a distance dist pq in G s l by x x q dist pq (L, L) := max. 0 x L x L x p Note that in general dist pq (L, L) distpq ( L, L) since the roles of L and L in the definition of distpq (L, L) are not symmetric. Define Θ P pq (A) = if B =. Otherwise, define Θ P pq (A) = L G B k L R B ++ = dist pq (kernel(a T B ), L). Similarly define Θ D pq (A) = if N =. Otherwise, define Θ D pq (A) = L G N r L R N ++ = dist pq (range N (A), L). Finally, define Θ pq (A) = {Θ P pq (A), Θ D pq (A)}. 2.3. The main results The six condition measures above are actually six families of condition measures. Indeed, each of them depends of a choice of norms for some of the spaces R m, R n, Lin(R m, R n ), and Lin(R n, R m ) as shown in the table below Measure R m R n Lin(R m, R n ) Lin(R n, R m ) σ (A) * C(A) * C (A) * ν(a) * Θ pq (A) ** κ Ď (A) * * where a dash means no norm is needed, a star * means a norm needs to be specified and the two stars ** refer to the norms p and q in R n.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 23 To state our main results, Theorems 4, specific choices of the norms above need to be made. Similar results for other choices of norms are straightforward by using well-known bounds for norm comparisons. The norm in R n corresponding to σ (A) and ν(a) appears in the definition of P and D and, to follow the original definition of Ye, we took them to be the -norm in Section 2. and in our main results. The norms corresponding to C(A), C (A), Θ pq (A), and κ Ď (A) are specified in the statements of Theorems 4. In the case of Θ pq (A) this is done with the subindex pq. When p = q = 2, however, we will eliate the 22 and write Θ(A) (as well as dist(l, L), Θ P (A) and Θ D (A)). Theorem. For any matrix A R m n, ν(a) = Θ (A). Theorem 2. For any matrix A R m n, ν(a) σ (A) nν(a). Theorem 3. Consider Lin(R m, R n ) and Lin(R n, R m ) endowed with the operator norm associated with the 2-norm in both R m and R n. For any matrix A R m n, C(A) max{κ Ď (A B ), κ Ď (A N )} Θ(A) C(A). Theorem 4. For any norm Y in R m, the norm Y in Lin(R m, R n ), and any matrix A R m n, A B A N C (A) C(A) max a j, Y a j C (A) A Y a j C (A). Y j N j n Here Y denotes the norm in Lin(Rm, R) dual to Y. The previous four theorems yield relationships among any two of the four measures σ (A), C(A), C (A), ν(a), Θ pq (A) for any choice of norms via suitable norm comparisons. The following corollary states one of the possible sets of relationships. Corollary. Assume R m and R n are endowed with the -norm and -norm respectively, and Lin(R m, R n ), Lin(R n, R m ) are endowed with the associated operator norms. Then a j j n mn A C (A) mn C(A) Θ (A), Θ (A) = ν(a) σ (A) nν(a) = nθ (A), and Θ (A) max{κ Ď (A B ), κ Ď (A N )} mn C(A) max{κ Ď (A B ), κ Ď (A N )} mn C (A). Proof. This follows by putting together Theorem through Theorem 4 and the fact that for A Lin(R m, R n ) A 22 mn A A 22 A mn A 22.

24 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 3. Proof of Theorem Lemma. For any matrix A R m n, ν P (A) Θ P (A). Proof. If B =, then ν P (A) = Θ P (A) = and the statement holds. In the following we consider B. Let L G B k such that dist (kernel(a T B ), L) < ν P (A). (3) We will show that L R B ++. To that end, let x be any vector in P such that ν P (A) = max x P x j = x j. (4) Since x P, from the uniqueness of the partition P(A) = (B, N) and () it follows that x N = 0. Hence x B = x = and x B kernel(a T B ). By the definition of dist, there exists x L such that x x B x B dist (kernel(a T B ), L). (5) By (3) (5), and using that x B = x = we have x x B = x x B x B dist (kernel(a T ), L) B < ν P (A) = x j. Therefore, for j B, x j = x j + x j x j x j x j x j x j x x B > 0. That is x > 0, and hence L R B ++ Θ P (A).. We conclude, by the definition of Θ P, that νp (A) Lemma 2. For any matrix A R m n, ν P (A) Θ P (A). Proof. We can assume again that B as otherwise the statement trivially holds. We will construct L G B k such that dist (kernel(a T B ), L) νp (A) and L R B ++ =. Let x be any vector in P such that x j = max x P x j = ν P (A). (6) We already remarked that x N = 0 and hence, x B = x = and x B kernel(a T B ). Denote by e the vector (,..., ) R B and e = {v R B e T v = 0}. Since dim kernel(a T) = B k and dim e = B we have dim(e kernel(a T)) B k. Let d,..., d k be linearly independent vectors in this space and D R B (k ) the matrix [d,..., d k ]. Then D has full column rank, e T D = 0, and kernel(a T B ) {Dy : y Rk }. (7) We claim that the matrix [x B, D] has full column rank. To see this, assume y 0 R and y R k are such that x B y 0 + Dy = 0. (8)

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 25 Since e T D = 0, y 0 (e T x B ) = y 0 (e T x B ) + e T Dy = 0. (9) Also, since ν(a) > 0, by (6), x B > 0. This implies y 0 = 0, which in turn, by (8), implies Dy = 0. Since D has full column rank, y = 0. We have thus shown that Eq. (8) has no non-trivial solution, i.e., [x B, D] has full column rank. Therefore, dim{x B y 0 + Dy : y 0 R, y R k } = k and consequently kernel(a T B ) = {x By 0 + Dy : y 0 R, y R k }. Let x = x B ν P (A)e and L = {x y 0 + Dy : y 0 R, y R k }. Let 0 x kernel(a T B ). There exist y 0 R and y R k such that x = x B y 0 + Dy. In addition, since e T D = 0, x e T x = e T (x B y 0 + Dy) = (e T x B )y 0 = y 0. (0) Let x = x y 0 + Dy. Then, x L. Moreover, by the definition of x, x x = x y 0 x B y 0 = x x B y 0 = ν P (A) y 0. () Combining (0) and () when y 0 0 we obtain x x νp (A)y 0 = ν P (A), x y 0 an inequality that trivially holds when y 0 = 0. Therefore, by the definition of dist, dist (kernel(a T B ), L) νp (A). (2) To finish, we next show that L R B ++ =. Assume to the contrary that there exists x L satisfying x > 0. Since x L, there exists ŷ 0 R and ŷ R k such that x = x ŷ 0 + D ŷ. (3) Therefore, x = e T x = e T (x ŷ 0 + D ŷ) = e T x ŷ 0 = x ŷ 0 which implies ŷ 0 > 0. Define x R n by taking x N = 0 and x B = x Bŷ 0 + D ŷ. (4) Using x > 0 along with (3), (4), and the definition of x, x B = x + (x B x )ŷ 0 > (x B x )ŷ 0 = v P (A)eŷ 0. (5) Since both ŷ 0, v P (A) > 0 we deduce x > B 0. In addition, using (4) and the equality et D = 0, x B = e T x = B et (x B ŷ 0 + D ŷ) = e T x B ŷ 0 = ŷ 0. (6) Combining Eqs. (5) and (6), we obtain However, x x B x j x B > νp (A)y 0 y 0 = ν P (A). (7) = x x P. (8)

26 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Eqs. (7) and (8) contradict the definition of ν P (A). We thus conclude that L R B ++ consequently, by the definition of Θ and (2), Θ P (A) dist (kernel(a T ), L) B νp (A). = and, Proof of Theorem. Combining Lemmas and 2, we have, for any matrix A R n m, Θ P (A) = νp (A). (9) Let D be any matrix such that kernel(d T ) = range(a). Recall that P(A) = (B, N). Then P(D) = (N, B), range N (A) = kernel(d T ), N and ν D (A) = ν P (D). By (9) applied to D, Θ P (D) = νp (D). (20) (2) (22) Let r := dim range N (A) = dim kernel(d T N ). Combining equalities (2) and (22), ν D (A) = Θ P (D) = dist (kernel(d T ), L) N L G N r L R N ++ = by the definition of Θ P = dist (range N (A), L) by (20) L G N r L R N ++ = = Θ D (A) by the definition of Θ D. Notice that in the expression L G N r of A and not to the one of D. We conclude that in the second step above, the superscript N refers to the partition ν(a) = {ν P (A), ν D (A)} = {Θ P (A), Θ D (A)} = Θ (A). 4. Proof of Theorem 2 Lemma 3. For any matrix A R n m, ν(a) σ (A). Proof. Assume B, in particular P. Let j be any index in B such that max x = j max x j = σ P (A). x P x P Let x be any vector in P such that x j = max x P x j = ν P (A). Using these equalities it follows that x j σ P (A) and x j ν P (A) and hence, that σ P (A) ν P (A). Now assume B =. Then, ν P (A) = σ P (A) = and hence σ P (A) ν P (A) as well. Similarly, one proves σ D (A) ν D (A) and, hence, the statement.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 27 Lemma 4. For any matrix A R n m, σ (A) nν(a). Proof. For j B, let x (j) be any vector in P such that x (j) j = max x P x j. Then, σ P (A) = x (j) j. Let x = x (j). Since x (j) P, x (j) > 0 for all j B. Therefore, x j = x (k) j x (j) j = σ P (A). (23) k B Furthermore, x (j) = for all j B. Therefore, x = x (j) x (j) = n. By its definition, x kernel(a T ) R n x ++. It follows that x P. Hence, by (23), ν P (A) = max x P Similarly, one can prove ν D (A) σ D (A) n x j and we conclude that ν(a) σ (A) n. x j σ P (A) x n Theorem 2 now follows from Lemmas 3 and 4.. 5. Proof of Theorem 3 The following lemma shows that in the case p = q = 2 the function dist is indeed a distance. Since we have not been able to find a proof in the literature, we give one in Appendix. Lemma 5. For any L, L G n m, (i) dist(l, L) = x dist( L, L ) = ma x L x s, and s L (ii) dist(l, L) = dist( L, L). Lemma 6. For any matrix A R n m with B(A), ρ B (A) A B Θ P (A).

28 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Proof. Let L G B k be such that dist(kernel(a T B ), L) < ρ B(A) A B. We will show that L R B ++. Let L be the orthogonal complement of L in R B. By Lemma 5, dist(range(a B ), L ) < ρ B(A) A B. (24) Let Ã B be any matrix in R B m such that range(ã B ) = L and let Ã Ď B in Rm B be the pseudo-inverse of Ã B. It is known that, for any s R B, Ã Ď B s is the least-squares solution of the system Ã B y = s, i.e., Ã B (Ã Ď Bs) s = Ã B y s. y R m Let y be any vector in R m and substitute s by A B y in the equality above. We obtain Ã B (Ã Ď B A By ) A B y = y R m Ã B y A B y which implies or yet or yet Ã B (Ã Ď B A By ) A B y A B y Ď = s A B y s range(ã B ) s A B y = s range(ã B ) A B y Ã B (ÃB max A By ) A B y = y R m A B y max y R m Ď (Ã B ÃB max A B A B )y y R m A B y max s range(ã B ) s range(a B ) s range(ã B ) s A B y A B y s s s. Since this inequality holds for any y R m, by the definition of operator norm, (Ã B Ã Ď B A B A B ) A B the last by Eq. (24). This implies Ã B Ã Ď B A B A B < ρ B (A). dist(range(a B ), range(ã B )) < ρ B(A) A B Let A B = Ã B Ã Ď B A B and A N = A N. Then, kernel(a B ) kernel(a B ) and A A = Ã B Ã Ď B A B A B < ρ B (A). By the definition of ρ B (A), P(Ã) = P(A). Hence, there exists x B R B such that x B > 0 and A T B x B = 0. It follows that 0 = A T B x B = A T BÃT Ď B ÃT B x B = A T BÃBÃĎ B x B the last step holds by (2). Let x B = Ã B Ã Ď B x B. Clearly, x B range(ã B ). In addition, the equality above shows that x B kernel(a T B ).

Assume x B 0. Then, by Lemma 5(i), dist(range(a B ), range(ã B )) = D. Cheung et al. / Journal of Complexity 26 (200) 209 226 29 max x kernel(a T B ) x range(ã B ) x T x x x Combining this inequality with Eq. (24) we obtain ρ B(A) A B Therefore, x B = Ã B Ã Ď B x B = 0. x T B x B x B x B =. >. This contradicts the definition of ρ B(A). Note that x B = Ã B Ã Ď B x B is the orthogonal projection of x B onto range(ã B ). The only possibility for this projection to be 0 is that x B is in kernel(ã T B ), that is, since range(ã B ) = L, that x B L. But x B > 0 and hence L R B ++. We have thus proved that for all L G B k with dist(kernel(a B), L) < ρ B(A) A B we have L RB ++. By the definition of Θ P (A), this implies ρ B(A) A B Θ P (A). Lemma 7. For any matrix A R n m and s range(a), there exists w R n such that s = AA T w. Proof. By hypothesis, there exists y R m such that s = Ay. Let y R range(a T ) and y K such that y = y R + y K. Then, kernel(a) s = Ay = A(y R + y K ) = Ay R + Ay K = Ay R the last since y K kernel(a). Now use that y R range(a T ) to deduce the existence of w R n such that y R = A T w and conclude that s = AA T w. Lemma 8. For any matrix A R n m with B(A), Θ P (A) ρ B (A) A Ď B = κ Ď (A B ) ρ B(A) A B. Proof. Let Ã R n m be such that Ã N = A N, range(ã T B ) range(at B ) and Ã B A B < Θ P (A) A Ď B We will show that P(Ã) = (B, N) = P(A). For w R B A Ď B. (25) Ã B A T B w A BA T B w = (Ã B A B )A T B w Ã B A B A T B w which implies Ã B A T B w A BA T B w A B A T B w < Θ P (A) A T B w A Ď B < Θ P T (A) A w B A Ď B A BA Tw Θ P T (A) A w B B A Ď B A BA Tw B = Θ P (A) A Tw B (A Ď B A by (2) B) T A T Bw = Θ P (A) A Tw B A T B ATĎ B ATw = Θ P T (A) A w B A Tw by (2) B B = Θ P (A).

220 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Thus, by Lemma 7, By Lemma 5 dist(range(a B ), range(ã B )) = max s range(a B ) s range(ã B ) = max w R B s range(ã B ) s s s s A B A T B w A B A T B w T Ã B A max w B A BA Tw B w R B A B A Tw B < Θ P (A). dist(kernel(a T B ), kernel(ãt B )) = dist(range(a B), range(ã B )) < Θ P (A). (26) Since range(ã T B ) range(at B ), rank (Ã B ) rank (A B ). On the other hand, it is known that A Ď = rank (A )<rank (A) A A. Thus, from (25) it follows that rank (Ã B ) rank (A B ). Therefore, rank (Ã B ) = rank (A B ). So, kernel(ã T B ) GB k and, by (26) and the definition of Θ P, kernel(ã T B ) RB ++. Thus P(Ã) = (B, N). From the definition of ρ B we finally obtain Θ P (A) A Ď B ρ B(A). The following Proposition immediately follows from Lemmas 6 and 8. Proposition. For any matrix A R n m with B(A), ρ B (A) A B Θ P (A) ρ B (A) A Ď = κ Ď B (A B ) ρ B(A) A B. We next proceed with the case N(A). Lemma 9. For any matrix A R n m with N(A), ρ N A N Θ D (A). Proof. Let L G N r such that dist(range N (A), L) < ρ N A N. (27) We will show that L R N ++. Let Ã N be any matrix in R N m such that range(ã N ) = L. As in Lemma 6 we have Ã N (Ã Ď Ns) s = Ã N y s. (28) y R m Let y be any vector in R m, h = dim(kernel(a B )), and Z any matrix in R m h such that the columns of Z form an orthonormal basis for kernel(a B ), i.e., range(z) = kernel(a B ) and Z T Z = I. Substituting s by A N Zy into Eq. (28) we obtain Ã N (Ã Ď N A NZy ) A N Zy = y R m Ã N y A N Zy = s A N Zy s range(ã N )

and reasoning as in Lemma 6 we obtain Ď Ã N ÃN max A NZ A N Zy y R m A N Z y max s range N (A) By the definitions of operator norm and dist, Ã N Ã Ď N A NZ A N Z A N Z the last by Eq. (27). Since Z =, D. Cheung et al. / Journal of Complexity 26 (200) 209 226 22 s range(ã N ) dist(range N (A), range(ã N )) < s s s. ρ N A N Ã N Ã Ď N A NZ A N Z < ρ N. (29) Let A N = Ã N Ã Ď N A N and A B = A B. Then, multiplying by ZZ T = I, A A = A N A N = (Ã N Ã Ď N A NZ A N Z)Z T (Ã N Ã Ď N A NZ A N Z) Z T = (Ã N Ã Ď N A NZ A N Z) < ρ N by Eq. (29). By the definition of ρ N, P(A) = P(A). Therefore, there exists y R m such that A N y > 0 and A B y = 0. Since y kernel(a B ) = range(z), there exists y R h such that y = Zy. In addition, by the definition of A N, A N y = Ã N Ã Ď N A NZy. Since A N y > 0, range(ã N ) R N ++. Also, since L = range(ã N ), L R N ++. We have thus proved that for all L G N r with dist(range N (A), L) < ρ N A N we have L RN ++. By the definition of Θ D (A), this implies ρ N A N Θ D (A). Lemma 0. For any matrix A R n m with N(A), Θ D (A) ρ N A Ď N = κ Ď (A N ) ρ N A N. Proof. Let Ã B = A B and Ã N be any matrix in R N m such that Ã N A N < Θ D (A) A Ď N. (30) For y R m, Ã N y A N y = (Ã N A N )y Ã N A N y < Θ D (A) y which implies, using (2), Ã N y A N y A N y A Ď N < Θ D (A) y A Ď N A Ny Θ D (A) y A Ď N A Ny = Θ D (A) y A T N AĎTy. N Since A T N AĎT N y is the orthogonal projection of y on range(at N ), AT N AĎT y y. It follows that N Ã N y A N y A N y < Θ D (A). (3)

222 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Let Z be any matrix in R m h such that the columns of Z form an orthonormal basis for kernel(a B ). For y R h, using (30) and Z =, Ã N Zy A N Zy = (Ã N A N )Zy < Θ D (A) y which implies Ã N Zy A N Zy y It follows that < Θ D (A) A Ď N A Ď N. Ã N Zy A N Zy Ã N Z A N Z = max y R h y A Ď N < A Ď N. Using (2) it is easy to show that (A N Z) Ď = Z T A Ď N and, therefore, and hence (A N Z) Ď = Z T A Ď N Z T A Ď N = AĎ N Ã N Z A N Z < Now use that (A N Z) Ď. (A N Z) Ď = A A N Z rank (A )<rank (A N Z) to deduce that rank (Ã N Z) rank (A N Z). Which implies that r = dim range N (A) dim(range(a N Z)) dim range(ã N Z). Let L be any linear subspace in G N r dist(range N (A), L) = max such that L range(ã N Z). Then, s L s range N (A) s s s max s range(ã N Z) s range N (A) s s s = max A N y Ã N Zy y R m A B y=0 Ã N Zy = max A N y Ã N ỹ ỹ range(z) A B y=0 Ã N ỹ A N ỹ Ã N ỹ max ỹ kernel(a B ) Ã N ỹ < Θ D (A) by (3). since range(z) = kernel(a B ) By the definition of Θ D, L R N ++. And since L range(ã N Z), Ã N (range(z)) R N ++. In other words, there exists y R m such that Ã N y > 0 and Ã B y = 0. This shows P(A) = P(Ã). By the definition of ρ N, we finally obtain Θ D (A) A Ď N ρ N.

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 223 Again, the following Proposition immediately follows from Lemmas 9 and 0. Proposition 2. For any matrix A R n m with N(A), ρ N (A) A N Θ D (A) ρ N (A) A Ď = κ Ď N (A N ) ρ N(A) A N. Theorem 3 now follows from Propositions and 2. 6. Proof of Theorem 4 The main result in [5] (take α j = for j =,..., n in Theorem therein), states that for any norm Y in R m and Lin(R m, R n ) endowed with Y one has, if N, Using that a j y ρ N (A) = max y L j N y. y 0 A N = max A N y = max y Y = j N it follows that as well as a j Y j N v N (A) = A N ρ N (A) A N = max y L y 0 a j Y j N A N max y Y = max y L j N y 0 a j y = max a j Y j N a j y a j Y y a j y max y L j N A N y = ρ N(A) A N y 0 j N a j y A N y = max y L y 0 j N a j y max y L j N a j y = v N(A). Y y 0 a j y max a l y Y l N Theorem in [5] (again with α j = for j =,..., n) also shows that, if B, ρ B (A) = max y L y 0 a j y y Y and reasoning as above it follows that a j Y v B (A) ρ B(A) A B A B v B(A). The conclusion of Theorem 4 is now immediate. Acknowledgments The second author was partially supported by CERG grant CityU 00707. The third author was partially supported by NSF grant CCF 0830533.

224 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 Appendix Lemma. For any L, L G n m and x Rn, x x = max x L s L s. Proof. Let x be any vector in L such that x x = x x. x L It is known that (x x) L. Therefore, max s L s x T (x x) (x x) = (x x + x)t (x x) (x x) = (x x)t (x x) (x x) = (x x)t (x x) (x x) + xt (x x) (x x) since x L and (x x) L = (x x) = x x by (32). x L On the other hand, let s be any vector in L such that x T s s = max s L s. (32) (33) Since x L and s L, x T s = 0 (x x) T s = x T s x x s x T s Proof of Lemma 5(i). x x xt s s x x max x L s L s dist(l, L) = max x L x L = max x L = max max x L s L = max max s L x L = max s L x x x x max s L s s s L = max x L x s = max x L s L x s = max s L using (32) and (33). x x x x L by Lemma x s s max x L x s s by Lemma s s = max = dist( L, L ). s L s L s

D. Cheung et al. / Journal of Complexity 26 (200) 209 226 225 Lemma 2. For any L, L G n m, if dist(l, L) <, then L L = {0}. In particular, L + L = R n. Proof. Suppose there exists x L L, x {0}. Then, by Lemma 5(i), dist(l, L) = max x L s L x s x T x x x = in contradiction with our hypothesis. So, L L = {0}. The second statement follows from the fact that L and L have dimensions m and n m, respectively. Proof of Lemma 5(ii). Let us consider the following two cases: (i) dist(l, L) and (ii) dist(l, L) <. In case (i), by the definition of dist, dist( L, L) = max x L x L x x x max x L x 0 x Let us consider case (ii). Let x be any vector in L such that x L x x x = dist(l, L). x x = max = dist( L, L). (34) x L x L x Since dist(l, L) <. Then, by Lemma 2, there exists x L and s L such that x = x + s. By (34), x x dist( L, L) = x L x x x x 2 x 2 x 2 x since x 2 x L = = x x x 2 x x 2 x 2 x s 2 x x 2 (x s) x x s 2 since x = x s ( x 2 + s 2 )x x 2 x + x 2 s = x ( x 2 + s 2 ) = s 2 x + x 2 s x ( x 2 + s 2 ) = s 4 x x ( x 2 + s 2 ) 2 + x 4 s 2 s = s 2 + x. 2 In addition, by Lemma 5(i), ( s) T x dist(l, L) x s = st (x s) x s s since x = x s s T s = x + s s since x s = s x + s s = since x s. s 2 + x 2 since x s since x s

226 D. Cheung et al. / Journal of Complexity 26 (200) 209 226 In conclusion, for any L, L G n m (no matter whether or not dist(l, L) ), dist(l, L) dist( L, L). Similarly, one can show that dist(l, L) dist( L, L). We thus conclude that dist(l, L) = dist( L, L). References [] A. Ben-Israel, T.N.E. Greville, Generalized Inverses: Theory and Applications, 2nd edition, Springer-Verlag, 2003. [2] S.L. Campbell, C.D. Meyer, Generalized Inverses of Linear Transformations, Pitman, 979. [3] D. Cheung, F. Cucker, A new condition number for linear programg, Math. Program. 9 (200) 63 74. [4] D. Cheung, F. Cucker, J. Peña, Unifying condition numbers for linear programg, Math. Oper. Res. 28 (2003) 609 624. [5] D. Cheung, F. Cucker, J. Peña, On strata of degenerate polyhedral cones. I: Condition and distance to stratae, European J. Oper. Res. 98 (2009) 23 28. [6] G. Golub, C. Van Loan, Matrix Computations, 3rd edition, John Hopkins Univ. Press, 996. [7] J. Renegar, Some perturbation theory for linear programg, Math. Program. 65 (994) 73 9. [8] J. Renegar, Incorporating condition measures into the complexity theory of linear programg, SIAM J. Optim. 5 (995) 506 524. [9] J. Renegar, Linear programg, complexity theory and elementary functional analysis, Math. Program. 70 (995) 279 35. [0] Y. Ye, Toward probabilistic analysis of interior-point algorithms for linear programg, Math. Oper. Res. 9 (994) 38 52.