Supplement: Hoffman s Error Bounds

IE 8534 1 Supplement: Hoffman s Error Bounds

IE 8534 2 In Lecture 1 we learned that linear program and its dual problem (P ) min c T x s.t. (D) max b T y s.t. Ax = b x 0, A T y + s = c s 0 under the Slater condition, admits the analytical central path {(x(µ), y(µ), s(µ)) Ax(µ) = b, A T y(µ) + s(µ) = c, x(µ) > 0, s(µ) > 0, x i (µ)s i (µ) = µ, for i = 1,..., n; µ > 0} and that lim µ 0 (x(µ), y(µ), s(µ)) = (x(0), y(0), s(0)) exists, and the limits are optimal solutions for (P ) and (D) respectively.

IE 8534 3 Now let c = e. One can easily show that y(µ) = (AX(µ)A T ) 1 b µ(ax(µ)a T ) 1 e, and x(µ) = X(µ)A T (AX(µ)A T ) 1 b+µe µx(µ)a T (AX(µ)A T ) 1 Ae. (1) But why write it in this particular way? There is an amazing fact to note here (Dikin, Stewart, and Todd): χ(a) := sup{ DA T (ADA T ) 1 D diagonal and D 0} <.

IE 8534 4 Let us try to understand why is χ(a) a finite number. Another way of writing χ(a) is the following { } y χ(a) = sup c y = argmin D1/2 (A T y c), D 0 diagonal, c R n. Denote λ(a) = max{ A 1 I I = m with A I invertible}. Clearly, λ(a) is finite. Theorem 1 χ(a) = λ(a).

IE 8534 5 Proof. For any I with I = m and A I non-singular, we let D ϵ be diagonal and Dii ϵ = 1 for i I and Dϵ ii = ϵ for i I. Clearly, D ϵ A T (AD ϵ A T ) 1 A 1 I as ϵ 0 and so λ(a) χ(a). To show χ(a) λ(a), we choose a fixed 0 c R n and a fixed positive diagonal matrix D. Consider the unique y(c, D) that minimizes D 1/2 (A T y c). Obviously the rank of the active constraints at y(c, D) must be equal to m. Let J be such that J = m, A J non-singular and A T J y(c, D) = c J. Hence y(c, D) = A T J c J. This shows that χ(a) sup{ A T J c J / c 0 c R n, J = m and A J non-singular} sup{ A T J J = m and A J non-singular} = λ(a). Combining the two inequalities the proposition follows.

IE 8534 6 A related quantity is: χ(a) := sup{ DA T (ADA T ) 1 A D diagonal and D 0} <. The above quantities play an important role in the complexity analysis for linear programming. Anyway, continuing from (1) we have x(µ) χ(a) b + nµ + n χ(a)µ. Therefore, x(0) χ(a) b.

IE 8534 7 But we assumed the Slater condition. What if the Slater condition does not hold? (though the problem itself is still feasible.) Let δ > 0, and consider {x Ax = b + δae, x 0}. The above system always satisfies the Slater condition. We then know that for any δ > 0, there is x δ 0 such that Ax δ = b + δae, x δ χ(a) b + δae. Therefore, by taking limit (on possibly a subsequence) there is always a feasible solution x satisfies the bound x χ(a) b. Theorem 2 For linear programming (P ), if it is feasible then it has a feasible solution whose norm is no more than χ(a) b ; if it has an optimal solution then it has a an optimal solution whose norm is no more than χ(a) b.

IE 8534 8 Another fact that follows immediately is the following: Lemma 1 Let J be a subset of {1, 2,..., n}. Denote A J (and x J ) to be a submatrix (subvector) of A (and x) in such a way that it collects all the columns (and components) of A (and x) whose indices belong to J. Suppose that A J x J = b, x J 0 is feasible. Then it always has a feasible solution x J such that x J χ(a) b. One way to see this is to observe the linear program ( P ) min e T J x J s.t. Ax = b x 0, is feasible and has a solution. Applying Theorem 2, the result follows.

IE 8534 9 Now let us consider the following problem. Suppose that S = {y A T y c}. Let z R m be not in S. The question is: Can we reasonably estimate the distance from z to S? This is the point when the issue of error bounds arises. Essentially we wish to have some computable measure f(z) which tells us something about the unknown quantity dist(z, S). Consider 1 min 2 z y 2 s.t. A T y c.

IE 8534 10 Let y be the optimal solution (the projection). Applying the KKT condition we know that this implies the existence of J {1, 2,..., n} (with J being its complement), such that y z = A J x J, s = c A T y, x 0, s 0, s T x = 0, s J = 0, x J = 0. In fact, once the index set J is identified, we may choose any x J 0 satisfying y z = A J x J, and the above KKT condition ensures that y is the projection. In particular, by Theorem 2 there is a short solution x with x χ(a) y z.

IE 8534 11 Putting things together, we have y z 2 = (y z) T Ax = (A T z A T y ) T x = (A T z c) T x + (c A T y ) T x (A T z c) T +x (A T z c) + x χ(a) (A T z c) + y z. Therefore, y z χ(a) (A T z c) +. This gives rise to an important result, known as Hoffman s error bound: Theorem 3 Suppose that S = {y A T y c}. Then, dist(z, S) χ(a) (A T z c) +.

IE 8534 12 It is easy to check that there is C > 0, such that (A T z c) + C dist(z, S). Therefore dist(z, S) = O ( (A T z c) + ). An interesting related result to Hoffman s error bound is as follows: If an affine subspace A and the polyhedral cone R n + do not intersect, then there must be a positive distance between them. Moreover, there are two points ˆx A and ŷ R n +, such that dist(ˆx, ŷ) = dist(a, R n +). To show this, it will be sufficient to prove a slightly more general result: Lemma 2 Suppose that Q 0. Let ϵ k 0 be a sequence. Let P be a polyhedron. Suppose that {x P x T Qx + c ϵ k } = for all k. Then {x P x T Qx + c 0} =.

IE 8534 13 Proof. Let x k = argmin { x x T Qx + c ϵ k, x P }, k = 1, 2,... If {x k k = 1, 2,..., } contains a bounded subsequence, then there will be a finite cluster point, which will be in the set {x P x T Qx + c 0}. Let us consider the case where x k is divergent. Then there will be a subsequence k K such that lim k K x k / x k = d. Since we have Qd = 0. ( ) T ( ) xk xk Q x k x k + ct x k x k 2 ϵ k x k 2

IE 8534 14 Without losing generality, write P = {x Ax = b, x 0}. For each k, let us construct a system Qx = Qx k Ax = b x 0. Clearly, it is feasible (e.g. x k is a feasible solution). Now, by Theorem 2 we know that it has a feasible solution y k such that y k χ ( Qx k + b ). By dividing x k on both sides we have y k / x k 0 as k K, which contradicts with the fact that x k is smallest in norm.

IE 8534 15 As a consequence, the shortest distance problem min x y 2 s.t. Ax = b, y R n + always has an attainable optimal solution. Therefore if {x Ax = b, x R n +} = then we can strictly separate the affine space {x Ax = b} from the cone R n +; i.e. there is λ R n such that λ T x < c for all Ax = b and λ T x > c for all x R n +. This implies: (i) λ 0; (ii) c 0; (iii) A T λ = 0; (iv) λ T b < 0. The above is the famous Farkas lemma!

IE 8534 16 One important implication of this analysis is that the projection of a polyhedron is always closed. Theorem 4 Let L be any affine mapping, and P is a polyhedron. Then L(P ) is always a polyhedron itself.

IE 8534 17 Key References: J.S. Pang, Error Bounds in Mathematical Programming, Mathematical Programming, 79, 299 332, 1997. S. Zhang, Global Error Bounds for Convex Conic Problems, SIAM Journal on Optimization, 10, 836 851, 2000. Z.Q. Luo and S. Zhang, On Extensions of the Frank-Wolfe Theorems, Computational Optimization and Applications, 13, 87 110, 1999.