Some preconditioners for systems of linear inequalities

Similar documents
A data-independent distance to infeasibility for linear conic systems

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature

Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures

A Deterministic Rescaled Perceptron Algorithm

Linear Algebra Massoud Malek

Solving Conic Systems via Projection and Rescaling

INNER PRODUCT SPACE. Definition 1

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

Systems of Linear Equations

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

arxiv: v3 [math.oc] 25 Nov 2015

The SVD-Fundamental Theorem of Linear Algebra

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

Lecture notes: Applied linear algebra Part 1. Version 2

Chapter 3 Transformations

Main matrix factorizations

On The Belonging Of A Perturbed Vector To A Subspace From A Numerical View Point

Row Space, Column Space, and Nullspace

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Contents. 1 Vectors, Lines and Planes 1. 2 Gaussian Elimination Matrices Vector Spaces and Subspaces 124

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Upon successful completion of MATH 220, the student will be able to:

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Math 407: Linear Optimization

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Solution of Linear Equations

Applied Linear Algebra in Geoscience Using MATLAB

An algorithm to compute the Hoffman constant of a system of linear constraints

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra March 16, 2019

Optimisation in Higher Dimensions

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

Rescaling Algorithms for Linear Programming Part I: Conic feasibility

Statistical Geometry Processing Winter Semester 2011/2012

Lecture Notes 1: Vector spaces

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Math 3191 Applied Linear Algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

EECS 275 Matrix Computation

The Lanczos and conjugate gradient algorithms

UNIT 6: The singular value decomposition.

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Largest dual ellipsoids inscribed in dual cones

Math Linear Algebra II. 1. Inner Products and Norms

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

distances between objects of different dimensions

MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix.

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Review problems for MA 54, Fall 2004.

6. Orthogonality and Least-Squares

Linear Algebra Review. Vectors

EECS 275 Matrix Computation

NOTES on LINEAR ALGEBRA 1

DISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of

Manning & Schuetze, FSNLP (c) 1999,2000

An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

SUMMARY OF MATH 1600

Uniform Boundedness of a Preconditioned Normal Matrix Used in Interior Point Methods

On solving linear systems arising from Shishkin mesh discretizations

arxiv: v2 [math.na] 27 Dec 2016

Diagonalizing Matrices

Review of similarity transformation and Singular Value Decomposition

2. Review of Linear Algebra

Basic Elements of Linear Algebra

Lecture 3: Review of Linear Algebra

Downloaded 10/11/16 to Redistribution subject to SIAM license or copyright; see

Lecture 4 Orthonormal vectors and QR factorization

HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS

A fast randomized algorithm for overdetermined linear least-squares regression

Lecture 6, Sci. Comp. for DPhil Students

Lecture 3: Review of Linear Algebra

Conceptual Questions for Review

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

12. Perturbed Matrices

Linear Algebra. Min Yan

MIDTERM. b) [2 points] Compute the LU Decomposition A = LU or explain why one does not exist.

MATH Linear Algebra

Interval solutions for interval algebraic equations

DS-GA 1002 Lecture notes 10 November 23, Linear models

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Elementary maths for GMT

7. Dimension and Structure.

A priori bounds on the condition numbers in interior-point methods

14 Singular Value Decomposition

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Key words. Complementarity set, Lyapunov rank, Bishop-Phelps cone, Irreducible cone

Applied Linear Algebra in Geoscience Using MATLAB

10. Smooth Varieties. 82 Andreas Gathmann

Transcription:

Some preconditioners for systems of linear inequalities Javier Peña Vera oshchina Negar Soheili June 0, 03 Abstract We show that a combination of two simple preprocessing steps would generally improve the conditioning of a homogeneous system of linear inequalities. Our approach is based on a comparison among three different but related notions of conditioning for linear inequalities. Introduction Condition numbers play an important role in numerical analysis. The condition number of a problem is a key parameter in the computational complexity of iterative algorithms as well as in issues of numerical stability. A related challenge of paramount importance is to precondition a given problem instance, that is, perform some kind of data preprocessing to transform a given problem instance into an equivalent one with that is better conditioned. Preconditioning has been extensively studied in numerical linear algebra [0, 5] and is an integral part of the computational implementation of numerous algorithms for solving linear systems of equations. In the more general optimization context, the task of designing preconditioners has been studied by Epelman and Freund [6] as well as by Belloni and Freund [3]. In a similar spirit, we propose two simple preconditioning procedures for homogeneous systems of linear inequalities. As we formalize in the sequel, these procedures lead to improvements in three types of condition Tepper School of Business, Carnegie Mellon University, USA, jfp@andrew.cmu.edu Collaborative esearch Network, University of Ballarat, AUSTALIA, vroshchina@ballarat.edu.au Tepper School of Business, Carnegie Mellon University, USA, nsoheili@andrew.cmu.edu

numbers for homogeneous systems of linear inequalities, namely enegar s [], Goffin-Cucker-Cheung s [5, 9], and the Grassmann condition number [, 4]. Both enegar s and Goffin-Cheung-Chucker s condition numbers are key parameters in the analyses of algorithmic schemes and other numerical properties of constraint systems [7, 8, 4]. The more recently developed Grassmann condition number is especially well-suited for probabilistic analysis []. We recall the definition of the above condition numbers in Section below. These condition numbers quantify certain properties associated to the homogenous systems of inequalities () and () defined by a matrix A m n, where m n. enegar s condition number is defined in terms of the distance from A to a set of ill-posed instances in the space m n. Goffin-Cucker-Cheung s condition number is defined in terms of the best conditioned solution to the system defined by A. Alternatively, it can be seen as the reciprocal of a measure of thickness of the cone of feasible solutions to the system defined by A. Goffin- Cucker-Cheung s condition number is intrinsic to certain geometry in the column space of A. The more recent Grassmann condition number, introduced in a special case by Belloni and Freund [4] and subsequently generalized by Amelunxen-Burgisser [] is based on the projection distance from the linear subspace spanned by the rows of A to a set of ill-posed subspaces in n. The Grassmann condition number is intrinsic to certain geometry in the row space of A. The Goffin-Cucker-Cheung s condition number and the Grassmann condition numbers are respectively invariant under column scaling and elementary row operations on the matrix A respectively. For suitable choices of norms, each of them is also always smaller than enegar s condition number (see (4) and (5) below). We observe that these properties have an interesting parallel and naturally suggest two preconditioning procedures, namely column normalization and row balancing. Our main results, presented in Section 3, discuss some interesting properties of these two preconditioning procedures. In particular, we show that a combination of them would improve the values of the three condition numbers. Condition numbers and their relations Given a matrix A m n with m n, consider the homogeneous feasibility problem Ax = 0, x 0, x 0, () and its alternative A T y 0, y 0. () Let F P and F D denote the sets of matrices A such that () and () are respectively feasible. The sets F P and F D are closed and F P F D =

m n. The set Σ := F P F D is the set of ill-posed matrices. For a given A Σ, arbitrary small perturbations on A can lead to a change with respect to the feasibility of () and (). enegar s condition number is defined as C (A) := A min{ A A : A Σ}. Here A denotes the operator norm of A induced by a given choice of norms in n and m. When the Euclidean norms are used in both n and m, we shall write C, (A) for C (A). On the other hand, when the one-norm is used in n and the Euclidean norm is used in m we shall write C, (A) for C (A). These are the two cases we will consider in the sequel. Assume A = [ a a n ] m n with a i 0 for i =,..., n. Goffin-Cheung-Cucker s condition number is defined as C GCC (A) := min max y = i {,...,n} a i a T i y. Here denotes the Euclidean norm in m. The quantity /C GCC (A) is the Euclidean distance from the origin { to the boundary} of the convex hull of the set of normalized vectors i a a i, i =,..., n. Furthermore, when A F D, this distance coincides with the thickness of the cone of feasible solutions to (). Let Gr n,m denote the set of m-dimensional linear subspaces in n, that is, the m-dimensional Grassmann manifold in n. Let P m := {W Gr n,m : W n + {0}}, and D m := {W Gr n,m : W n + {0}}, Σ m = P m D m = { W Gr m,n : W n + {0}, W int( n +) = }. The sets P m, D m, and Σ m are analogous to the sets F P, F D, and Σ respectively: If A m n is full-rank then A F P span(a T ) P m and A F D span(a T ) D m. In particular, A Σ span(a T ) Σ m. The Grassmann condition number is defined as []: C Gr (A) := min{d(span(a T ), W ) : W Σ m }. Here d(w, W ) = σ max (Π W Π W ), where Π Wi denotes the orthogonal projection onto W i for i =,. In the above expression and in the 3

sequel, σ max (M) and σ min (M) denote the largest and smallest singular values of the matrix M respectively. C, We next recall some key properties of the condition numbers C,, C GCC and C Gr. Throughout the remaining part of the paper assume A = [ ] a a n m n is full-rank and a i 0 for i =,..., n. From the relationship between the one-norm and the Euclidean norm, it readily follows that C,, (A) C, n (A) nc, (A), (3) The following property is a consequence of [5, Theorem ]: max a i C GCC (A) C, (A) i=,...,n min a C GCC (A). (4) i i=,...,n The following property was established in [, Theorem.4]: C Gr (A) C, (A) σ max(a) σ min (A) C Gr(A). (5) The inequalities (4) and (5) reveal an interesting parallel between the pairs C GCC (A), C, (A) and C Gr(A), C, (A). This parallel becomes especially striking by observing that the fractions in the right hand sides of (4) and (5) can be written respectively as max a i i=,...,n min a = max{ Ax : x = } i min{ Ax : x = }, i=,...,n and σ max (A) σ min (A) = max{ AT y : y = } min{ A T y : y = }. 3 Preconditioning via normalization and balancing Inequalities (4) and (5) suggest two preconditioning procedures for improving enegar s condition number C (A). The first one is to normalize, that is, scale the columns of A so that the preconditioned matrix à satisfies max ã i = min ã i. The second one is to balance, i=,...,n i=,...,n that is, apply an orthogonalization procedure to the rows of A such as Gram-Schmidt or Q-factorization so that the preconditioned matrix à satisfies σ max (Ã) = σmin(ã). Observe that if the initial matrix A 4

is full rank and has non-zero columns, these properties are preserved by each of the above two preconditioning procedures. The following proposition formally states some properties of these two procedures. Proposition Assume A = [ a ] a n m n is full-rank and a i 0 for i =,..., n. (a) Let à be the matrix obtained after normalizing the columns of A, that is, ã i =, i =,..., n. Then ai a i C, (Ã) = CGCC(Ã) = C GCC(A). This transformation is optimal in the following sense C, (Ã) = min{c,(ad) : D is diagonal positive definite}. (b) Let à be the matrix obtained after balancing the rows of A, that is, à = Q T, where Q = A T with Q n m, m m is the Q-decomposition of A T. Then C, (Ã) = CGr(Ã) = C Gr(A). This transformation is optimal in the following sense C, (Ã) = min{c, (P A) : P m m is non-singular}. Proof: Parts (a) and (b) follow respectively from (4) and (5). Observe that the normalization procedure transforms the solutions to the original problem () when this system is feasible. More precisely, observe that Ax = 0, x 0 if and only if ÃD x = 0, D x 0, where D is the diagonal matrix whose diagonal entries are the norms of the columns of A. Thus a solution to the original problem () can be readily obtained from a solution to the column-normalized preconditioned problem: premultiply by D. At the same time, observe that the solution to () does not change when the columns of A are normalized. Similarly, the balancing procedure transforms the solutions to the original problem (). In this case A T y 0, y 0 if and only if à T y 0, y 0, where Q = A T is the Q-decomposition of A T. Hence a solution to the original problem () can be readily obtained from a solution to the row-balanced preconditioned problem: premultiply by. At the same time, observe that the solution to () does not change when the rows of A are balanced. We next present our main result concerning properties of the two combinations of normalization and balancing procedures. 5

Theorem Assume A = [ a ] a n m n is full-rank and a i 0 for i =,..., n. (a) Let  be the matrix obtained after normalizing the columns of A and then balancing the rows of the resulting matrix. Then C GCC (Â) C, n (Â) = CGr(Â) nc GCC (A). (6) (b) Let  be the matrix obtained after balancing the rows of A and then normalizing the columns of the resulting matrix. Then Proof: C Gr (Â) C, n (Â) = CGCC(Â) nc Gr (A). (7) (a) Let à be the matrix obtained by normalizing the columns of A. Proposition (a) yields C, (Ã) = C GCC(Ã) = C GCC(A). (8) Since  is obtained by balancing the rows of Ã, Proposition (b) yields (Â) = CGr(Â) = CGr(Ã) C,(Ã). (9) C, Inequality (6) now follows by combining (3), (4), (8), and (9). (b) The proof this part is essentially identical to that of part (a) after using the parallel roles of the pairs C GCC, C, and C Gr, C, apparent from (4), (5), and Proposition. Observe that both combined preconditioners in Theorem transform A to a  = P AD where D is a diagonal matrix and P is a square non-singular matrix. The particular P and D depend on what procedure is applied first. Hence each of the combined preconditioners transforms both sets of solutions to the original pair of problems () and (). In this case, solutions to the original problems can be readily obtained from solutions to the preconditioned problems via premultiplication by D for the solutions to () and by P T for the solutions to (). As discussed in [, Section 5], there is no general relationship between C Gr and C GCC, meaning that either one can be much larger than the other in a dimension-independent fashion. Theorem (a) shows that when C GCC (A) C Gr (A) column normalization followed by row balancing would reduce both C Gr and C, while not increasing C GCC beyond a factor of n. Likewise, Theorem (b) shows that when C Gr (A) C GCC (A) row balancing followed by column normalization 6

would reduce both C GCC and C, while not increasing C Gr beyond a factor of n. In other words, one of the two combinations of column normalization and by row balancing will always improve the values of all three condition numbers C GCC (A), C Gr (A) and C (A) modulo a factor of n. The following two questions concerning a potential strengthening of Thorem naturally arise: (i) Does the order of row balancing and column normalization matter in each part of Theorem? (ii) Are the leftmost and rightmost expressions in (6) and (7) within a small power of n? In other words, do the combined preconditioners make all C GCC (A), C Gr (A) and C (A) the same modulo a small power of n? The examples below, adapted from [, Section 5], show that the answers to these questions are yes and no respectively. Hence without further assumptions, the statement of Theorem cannot be strengthened along these lines. [ ] /ɛ Example Assume ɛ > 0 and let A =. It is easy to see 0 that C GCC (A) =. After balancing the rows of A and normalizing the columns of the resulting matrix we get  = ( + ɛ ) [ ( + ɛ ) ɛ ɛ 0 + ɛ + ɛ It is easy to show that C GCC (Â) = (+ɛ ) ɛ. Thus for ɛ > 0 sufficiently small, C GCC (Â) can be arbitrarily larger than C GCC(A). Therefore (6) does not hold for A, Â. [ ] ɛ Example Assume 0 < ɛ < / and let A =. Using [, Proposition.6], it can be shown that C Gr (A) = + ( + ɛ) 0 + ɛ. After normalizing the columns of A and balancing the rows of the resulting matrix, we obtain  = θδ θɛ ( + ɛ) θɛ + ( + ɛ) δ 0 + ( + ɛ). ( + ɛ) where δ = + 3 ( + ɛ) and θ = δ+ɛ. Using [, Proposition.6] again, it can be shown that C Gr (Â) = + δ ɛ. Therefore, C Gr (Â) can be arbitrarily larger than C Gr (A). Hence (7) does not hold for A, Â. ]. 7

[ ] + ɛ + ɛ + ɛ Example 3 Assume ɛ > 0 and let A =. In +(+ɛ) this case C GCC (A) = ɛ. After normalizing the columns of A and balancing the rows of the resulting matrix, we get [ ] β β Â = γ, γ + β γ γ β +(+ɛ) + ( ɛ). It is easy to see that where β = and γ = C GCC (Â) =. Therefore, C GCC (Â) can be arbitrarily smaller than C GCC (A) in (6). [ ] ɛ Example 4 Assume 0 < ɛ < / and let A =. In 0 this case C Gr (A) = + ɛ. After balancing the rows of A and normalizing the columns of the resulting matrix we get [ ] ( + ɛ ) Â = ( + ɛ ) 0. + ɛ + ɛ Using [, Proposition.6], it can be shown that C Gr (Â) = + ɛ. Thus C Gr (Â) can be arbitrarily smaller than C Gr(A) in (7). eferences [] D. Amelunxen and P. Bürgisser. A coordinate-free condition number for convex programming. SIAM Journal on Optimization, (3):09 04, 0. [] D. Amelunxen and P. Bürgisser. Probabilistic analysis of the Grassmann condition number. Technical report, University of Paderborn, Germany, 03. [3] A. Belloni and. Freund. A geometric analysis of renegars condition number, and its interplay with conic curvature. Math. Program., 9:95 07, 009. [4] A. Belloni and. Freund. Projective re-normalization for improving the behavior of a homogeneous conic linear system. Math. Program., 8:79 99, 009. [5] D. Cheung and F. Cucker. A new condition number for linear programming. Mathematical programming, 9():63 74, 00. [6] M. Epelman and. Freund. A new condition measure, preconditioners, and relations between different measures of conditioning for conic linear systems. SIAM J. on Optim., :67 655, 00. 8

[7] M. Epelman and. M. Freund. Condition number complexity of an elementary algorithm for computing a reliable solution of a conic linear system. Math. Program., 88(3):45 485, 000. [8]. Freund and J. Vera. Condition-based complexity of convex optimization in conic linear form via the ellipsoid algorithm. SIAM J. on Optim., 0:55 76, 999. [9] J. L. Goffin. The relaxation method for solving systems of linear inequalities. Mathematics of Operations esearch, pages 388 44, 980. [0] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, third edition, 996. [] J. Peña and J. enegar. Computing approximate solutions for convex conic systems of constraints. Math Program., 87:35 383, 000. [] J. enegar. Incorporating condition measures into the complexity theory of linear programming. SIAM Journal on Optimization, 5(3):506 54, 995. [3] J. enegar. Linear programming, complexity theory and elementary functional analysis. Mathematical Programming, 70(-3):79 35, 995. [4] N. Soheili and J. Peña. A smooth perceptron algorithm. SIAM Journal on Optimization, ():78 737, 0. [5] L. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 997. 9