Dipartimento di Energetica Sergio Stecco, Università di Firenze, Pubblicazione n. 2/2011.

Size: px

Start display at page:

Download "Dipartimento di Energetica Sergio Stecco, Università di Firenze, Pubblicazione n. 2/2011."

Steven York
6 years ago
Views:

1 A preconditioning framewor for sequences of diagonally modified linear systems arising in optimization Stefania Bellavia, Valentina De Simone, Daniela di Serafino,3, Benedetta Morini Dipartimento di Energetica Sergio Stecco, Università di Firenze, Pubblicazione n. /0. Dipartimento di Energetica S. Stecco, Università di Firenze, Viale G.B. Morgagni 40/44, 5034 Firenze, Italy, Dipartimento di Matematica, Seconda Università degli Studi di Napoli, Viale A. Lincoln 5, 800 Caserta, Italy, 3 Istituto di Calcolo e Reti ad Alte Prestazioni ICAR), CNR, Via P. Castellino, 803 Napoli, Italy

2 A PRECONDITIONING FRAMEWORK FOR SEQUENCES OF DIAGONALLY MODIFIED LINEAR SYSTEMS ARISING IN OPTIMIZATION STEFANIA BELLAVIA, VALENTINA DE SIMONE, DANIELA DI SERAFINO, AND BENEDETTA MORINI Abstract. We propose a framewor for building preconditioners for sequences of linear systems of the form A + )x = b, where A is symmetric positive semidefinite and is diagonal positive semidefinite. Such sequences arise in several optimization methods, e.g., in affine-scaling methods for bound-constrained convex quadratic programming and bound-constrained linear least squares, as well as in trust-region and overestimation methods for convex unconstrained optimization problems and nonlinear least squares. For all the matrices of a sequence, the preconditioners are obtained by updating any preconditioner for A available in the LDL T form. The preconditioners in the framewor satisfy the natural requirement of being effective on slowly varying sequences; furthermore, under an additional property they are also able to cluster eigenvalues of the preconditioned matrix when some entries of are sufficiently large. We present two low-cost preconditioners sharing the abovementioned properties and evaluate them on sequences of linear systems generated by the reflective Newton method applied to bound-constrained convex quadratic programming problems, and on sequences arising in solving nonlinear least-squares problems with the Regularized Euclidean Residual method. The results of the numerical experiments show the effectiveness of these preconditioners. Key words. Sequences of linear systems, preconditioning, incomplete LDL T factorization, large-scale convex optimization. AMS subject classifications. 65F08, 65F50, 90C0, 90C5.. Introduction. We are interested in the efficient solution of sequences of diagonally modified linear systems by preconditioned Krylov methods. The sequences considered here have the form.) A + )x = b, = 0,,..., where A R n n is symmetric positive semidefinite and sparse, and R n n is diagonal positive semidefinite. We assume that the systems are compatible and address the problem of preconditioning each of them by updating a preconditioner available for A, called seed preconditioner. The seed preconditioner is assumed to be a symmetric positive definite SPD) matrix, although A may be semidefinite, and to be factorized in the LDL T form. The goal is to perform low-cost updates of its factors to build an effective preconditioner for each matrix of the sequence. There are several possible applications of these preconditioner updating techniques. We focus on optimization, where notable classes of algorithms require the solution of sequences of the form.) to compute the step between two subsequent iterates. When it is too expensive or even not feasible to factorize the matrices, the systems are solved by Conjugate Gradient CG) type methods and the availability Wor partially supported by INdAM-GNCS under grant Progetti 0 Advanced Numerical Methods for Large-Scale Constrained Nonlinear Optimization Problems) Dipartimento di Energetica S. Stecco, Università degli Studi di Firenze, viale G. B. Morgagni 40, 5034 Firenze, Italy, stefania.bellavia@unifi.it, benedetta.morini@unifi.it). Dipartimento di Matematica, Seconda Università degli Studi di Napoli, viale A. Lincoln 5, 800 Caserta, Italy valentina.desimone@unina.it). Dipartimento di Matematica, Seconda Università degli Studi di Napoli, viale A. Lincoln 5, 800 Caserta, Italy, and Istituto di Calcolo e Reti ad Alte Prestazioni, CNR, via P. Castellino, 803 Napoli, Italy daniela.diserafino@unina.it).

3 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS of efficient preconditioners is a ey issue for the overall efficiency of the optimization methods. Next, we briefly outline optimization methods that call for the solution of the sequences under consideration. Some more details are given in section 4. Sequences of the form.) arise, e.g., in affine-scaling interior-point methods for bound-constrained convex quadratic programming [4, 0]. In this case, the application of Newton-lie methods to first-order optimality conditions gives rise to a system of the form shown in.) at each iteration. Here A is the Hessian of the objective function, scaled on both sides by a suitable diagonal matrix, and depends on the scaling matrix and possibly on regularization terms that mitigate the ill-conditioning of the systems [3]. The matrix can be either positive definite or semidefinite. Further examples are provided by trust-region [9, ] and overestimation methods [, 8, 8] for convex optimization problems. In these settings, at each iteration the step minimizes a regularized local model of the objective function. In trust-region methods, a sequence of systems of the form.) has to be solved if the minimizer lies on the trust-region boundary and an approximation beyond the Steihaug-Toint point is required [9]. The sequence may stem from the application of a root-finding method to the so-called secular equation or by an interior-point sequential subspace minimization that solves the inequality constrained trust-region problem over a sequence of evolving low-dimensional subspaces [3, 4]. In these cases = α G, where α is a nonnegative scalar and G is a diagonal positive definite matrix specifying an elliptical shape for the trust region. In overestimation methods [, 8, 8], sequences of systems of the form.) derive from the solution of the secular equation and = α I for a positive α. We note that an efficient preconditioning strategy for the matrices A + provides a viable alternative to procedures that find an approximate step over a sequence of expanding subspaces associated with the iterative linear system solver and then recover the solution in R n, as proposed in [7, 6]. Preconditioner updating techniques are discussed in [, 5, 6, ] for sequences of SPD systems differing by a diagonal matrix; in particular, = α I, with α > 0, is considered in [, 5, ]. The procedures in [5, 6] are based on the use of an approximate inverse seed preconditioner, i.e., an incomplete LDL T factorization of A, while those in [, ] perform updates of an incomplete LDL T factorization of A. All of them are effective alternatives to reusing the seed preconditioner over subsequent systems frozen preconditioner) or to recomputing the preconditioner for each A +. A remarable aspect of the preconditioner updating techniques proposed in the literature is that they wor well in spite of their low cost. The frozen preconditioner may yield convergence slowdown, while the recomputed one may be too expensive and pointlessly accurate; in practice, updated preconditioners are between the frozen and the recomputed ones in terms of solver iterations, but are often the fastest ones in terms of computational time. In this paper we propose a framewor for building preconditioners for the sequences under consideration and provide an analysis of their quality in terms of the size of the entries of. Since a natural requirement is that the preconditioners wor well on slowly varying sequences, we first show their effectiveness when the matrices A + undergo small changes. Then we discuss their ability to cluster eigenvalues of the preconditioned matrix when some entries of are sufficiently large. Finally, we present two specific preconditioners in the proposed framewor, which can be constructed at a low cost. The efficiency of these preconditioners is tested on two sets of problems. One set consists of sequences of systems arising in the application of the reflective Newton method [0], as implemented in the quadprog function of the Matlab

4 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 3 Optimization Toolbox; in this case, the preconditioners are embedded into quadprog and the tests are carried out on bound-constrained convex quadratic programming problems from the CUTEr collection [7]. The other set is made of sequences generated in the application of the Regularized Euclidean Residual overestimation method [] to nonlinear least-squares problems. The paper is organized as follows. In section we describe our framewor and provide a theoretical analysis of the relevant preconditioners. In section 3 we present two preconditioners belonging to the framewor, which can be built at a low computational cost. Section 4 is devoted to the numerical experiments and section 5 provides some concluding remars... Notations. In the following, denotes the vector or matrix -norm. For any matrix M, either m i,j or M) i,j denotes the i, j)th entry of M; furthermore, for vectors and matrices having a subscript, such as v and M, we use v i and m i,j to represent the ith entry of v and the i, j)th entry of M, respectively. If v = v,..., v n ) T, then diagv), as well as diagv,..., v n ), is the n n diagonal matrix having the entries of v on the main diagonal. If M is a n n matrix, diagm) is the n n diagonal matrix with the same diagonal entries as M, while off M) = M diagm), i.e., off M) is the matrix consisting of the off-diagonal part of M. For any matrix M, nnzm) is the number of nonzero entries of M. If M is symmetric, λ i M) denotes the ith eigenvalue of M with the eigenvalues sorted in any order), while λ min M) and λ max M) denote its minimum and maximum eigenvalues. Finally, the sequence of matrices to be preconditioned is indicated by {A + } and the corresponding sequence of preconditioners by {P }.. A framewor for updating preconditioners. In this section we define a framewor for the construction and analysis of a preconditioner P for each matrix A +, where = diagδ,..., δn) and δi 0, i =,..., n. The preconditioner P is obtained by updating an SPD preconditioner available for A in the factorized LDL T form, where L is unit lower triangular and D = diagd,..., d n ) with d i > 0, i =,..., n. Definition.. A sequence of preconditioners {P } for {A + } belongs to the class UFP A, ) Updated Factorization Preconditioners for {A + }) if.) P = L D L T, with L lower triangular and D = diagd,..., d n) such that A.) d i d i + δi, i =,..., n; A.) there exists σ > 0 independent of such that D D σ ; A.3) diagl ) = diagl) =,,..., ) T and off L ) = off L)S, where S = DD. The previous definition is motivated by the desire of having preconditioners that can be applied at the same cost as the seed preconditioner, and mimic in some sense the behaviour of the corresponding matrices A +. The first objective is pursued by eeping fixed the sparsity pattern of the preconditioner and by computing its triangular factors with the simple diagonal scaling in A.3). The second objective is pursued by imposing that the distance of D from D behaves lie see A.)), the entries of D increase at least at the same rate as the entries of D + see A.)), and the off-diagonal part of L decreases when the entries of increase see A.3)), i.e., when the diagonal of A + dominates over the remaining entries.

5 4 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS By definition, P is SPD and S = diag s,..., sn) is such that s i 0, ], i =,..., n, i.e., S. It is worth noting that, by using this property of S, we can show that the conditioning of the matrices L is at least as good as the conditioning of L. Specifically, for any unit lower triangular matrix T and its inverse, we have nn ) T n + MT, T MT + ) n + nm T + ).), M T + where M T = max j<i { t i,j } [0]. These bounds provide an upper bound on the condition number of T, which increases if M T increases. Since s i 0, ], it is max j<i { l i,j } max j<i{ l i,j }; hence the bound on the condition number of L is lower than or equal to the bound on the condition number of L. Finally, we observe that the computational cost of building P is at least equal to the cost of scaling L to build L, which is OnnzL)); then, the choice of D has a crucial role in determining the overall cost of the preconditioner. As shown in sections 3 and 4, the choice of D is important to achieve a good tradeoff between the whole computational cost and the effectiveness of preconditioner... Analysis of the preconditioners. To analyze the properties of the preconditioners in the class UFP A, ), we first assume that A is SPD and the seed preconditioner is LDL T = A. Then, in section., we relax this assumption and discuss the extension of the properties to the case where the seed preconditioner is an incomplete factorization of A and to the case where A is positive semidefinite. We start providing an expression of the diagonal and off-diagonal parts of A + P, as well as upper bounds on their norms, that will be useful next. Lemma.. Let {P } UFP A, ). Then,.3).4) diaga + P ) = D + D + diagoff L)S D D)off L) T ), off A + P ) = off off L)S D D)off L) T ). Proof. The diagonal entries of A + P have the following expression:.5) A + P ) i,i = a i,i + δi L D L T ) i,i i = a i,i + δi d i li,r) d r. r= Hence, by using a i,i = d i + i r= l i,r) d r and A.3) we get.6) i A + P ) i,i = d i + δi d i + l i,r ) d r s r) d r). Furthermore, by the definition of S in A.3), r= D S D = S D D), and equality.3) easily follows. Concerning the off-diagonal entries of A + P, without loss of generality we consider i > j. From A.3) it follows that A + P ) i,j = LDL T ) i,j L D L T ) i,j

6 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 5 = j li,r l j,r d r li,rl j,rd r) r= j = l i,r l j,r d r s r) d r) + l i,j d j s j d j ), r= which yields.4) since d j s j d j = 0. Lemma.3. Let D, D and S be the matrices in Definition.. Then.7).8) diagoff L)S D D)off L) T ) off L) S D D) off L) D, off off L)S D D)off L) T ) off L) S D D) off L) D. Proof. The first inequality in.7) follows from the inequality max i,j b i,j B, which holds for any matrix B. By observing that we get.9) S D D) = max i d i d i d i S D D) D, d i and 0 d i d i d i <, and the second inequality in.7) holds. Furthermore, since S D D) is positive semidefinite by construction, off L)S D D)off L) T is positive semidefinite; hence, by [, Lemma 4.], it is off off L)S D D)off L) T ) off L)S D D)off L) T, from which the first inequality in.8) trivially follows. The second inequality in.8) is obtained by using.9). By exploiting the previous lemmas, we show that P provides a good approximation of A + when is sufficiently small, and hence the sequence {P } can be effective on slowly varying sequences. Theorem.. Let {P } UFP A, ). Then, there exists ζ > 0 independent of such that.0) A + P ζ. Proof. Using Lemmas. and.3 we have A + P D + D + off L) S D D). Then, from A.) and S, we get A + P + σ) + σ off L). Thus.0) holds by letting ζ = + σ + σ off L). Clearly, P A + ) has real positive eigenvalues, since it is similar to an SPD matrix. The next theorem shows that these eigenvalues are clustered around as gets smaller.

7 6 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS Theorem.. Let {P } UFP A, ). For all ε > 0 there exists η > 0 independent of such that, if < η, then λ i P A + ) ) < ε, i =,..., n. Furthermore, if A + P has ran n l, then l eigenvalues of P A + ) are equal to. Proof. We note that P A+ ) = A+ +P A )) A+ ) = I+A+ ) P A )). Then, if λ is an eigenvalue of P A + ), we have.) v = λi + A + ) P A ))v, where v is an eigenvector corresponding to λ. Without loss of generality, we assume v =. From.) it follows that λ = if and only if A+ ) P A )v = 0, i.e., v belongs to the null space of P A. So, if ranp A ) = n l, then P A + ) has at least l unit eigenvalues. Now suppose that A + ) P A )v 0 and multiply.) by v T on the left, obtaining.) and hence.3) = λ + v T A + ) P A )v), λ = v T A + ) P A )v + v T A + ) P A )v, where + v T A + ) P A )v 0 by.). By [5, Theorem 8..5] we have which implies λ min A + ) λ min A) + λ min ) λ min A), A + ) A. By Theorem., if ζ A <, then.4) λ A + ) P A A + ) P A, ζ A ζ A. The thesis follows by taing { η min ζ A, } ε ζ + ε) A. Now we are interested in understanding the behaviour of P when is large. By requiring that a sequence {P } belonging to UFP A, ) satisfies the following additional property:

8 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 7 A.4) there exists τ > 0 independent of such that D + D < τ, we can show that the effectiveness of P increases as increases, as stated by the following theorem. Theorem.3. Let {P } UFP A, ) satisfy A.4). For all ε > 0 there exists ϑ > 0 independent of such that, if > ϑ, then.5) A + P A + < ε. Proof. We first show that A + P is bounded above independently of. From Lemmas. and.3 it follows that thus, by using A.4), we get A + P D + D + off L) D ; A + P τ + off L) D. Furthermore, if > A then A + A > 0, and inequality.5) follows by taing { ϑ max A, ε A + τ + off } L) D. ε Remar.. We observe that properties A.) A.3) do not ensure that A + P is bounded independently of if increases, so the previous theorem does not hold without assumption A.4). For example, by considering = D + and defining L as in A.3), we have a sequence {P } UFP A, ), but D + D =, which cannot be bounded independently of. The following theorem shows that the preconditioners satisfying A.) A.4) are effective in clustering the eigenvalues when the entries of are large. Theorem.4. Let {P } UFP A, ) be such that A.4) holds. There exists ξ > 0 independent of such that, if q entries of satisfy δi > ξ, then q eigenvalues, λ j,..., λ jq, of P A + ) satisfy λ j P A + ) ) < ε. Proof. Without loss of generality, we assume that δ,..., δ q are the q largest entries of. Then we partition A,, L and D into blocs as follows:.6).7) A, A A = T, A, A, L L =, 0 L, L, ), = 0 0 ) D, D = 0 0 D ), ),

9 8 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS where A,, D, L,, R q q, A,, D, L,, R n q) n q) and A,, L, R n q) q. Now we consider D ) L ) A + )L ) T D ), which is similar to P A + ). It is easy to see that ) D ) L ) A + )L ) T D ) B, B T =,, B, B, where.8) and.9) B, = D ) L ), A, + ) ) L T ), D, B, = D ) L ), L ), L, A, + ) ) + A, L ) T ), D, we neglect, for simplicity, the superscript in B i,j and do not provide any expression for B, because it is not needed next). In the following we show that ) B, B,.0) T = X + Y, B, B, where X = ) Iq 0 N B T, Y =, 0 B, B, 0 ), and Y vanishes as d i, i =,..., q, increases. We first consider the bloc B,. From A.3) it follows that L, ) = I q + off L, )D D ) ), where L, and D are the,) blocs of the matrices L and D partitioned as in.6).7). Then, letting and B, can be written as Z = D ) ) I q + off L, )D D ) D ) I q = I q + D ) ) off L, )D D ) Iq, W = D ) A, + ) ) D, B, = I q + Z)W I q + Z) T. Let us analyze Z and W. We observe that for sufficiently large values of d i, i =,..., q, it is D ) ) off L, )D D D) off L, )D <.

10 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 9 Then, by using the Banach Lemma [5, Lemma.3.3] and noting that Z = I q + D ) ) off L, )D D ) D ) off L, )D D ), we get.) ) D ) off L, )D D Z ) D ) off L, )D D, hence Z vanishes when the values of d i, i =,..., q, increase. Furthermore, the matrix W can be written as where Since W = I q + diagw I q ) + off W ), w i,i = a i,i + δi, i =,..., q, w i,j = a i,i + δi d i d i a i,j ) d i d, i, j =,..., q, i j. j i r= l i,r) d r + d i + δi d i, and, by hypotesis, d i + δi d i < τ, we have that diagw I q ) and off W ) vanish as d i, i =,..., q, increases. Thus, B, in.8) has the following form: B, = I q + Z) I q + diagw I q ) + off W )) I q + Z) T = I q + N, where N is a matrix such that N reduces towards zero as d i, i =,..., q, increases. Clearly, N is symmetric because B, is symmetric. Finally, we consider B, in.9), which can be written as B, = D ) L ) ), L, D B, D ) L ) ), A, L T ), D. Property A.3) implies d i max j<i { L,) i,j } max j<i { L,) i,j }, max j<i { L,) i,j } max j<i { L,) i,j }, and, by.), we have that L,) and L,) are bounded. Furthermore, D ) D and, by A.3), L ), D ) = L, D D. Then, we can conclude that B, vanishes as the values of d i, i =,..., q, increase.

11 0 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS By applying [5, Theorem 8..5] to the matrix in.0), it follows that λ i X) + λ min Y ) λ i D ) L A + )L T D λ i X) + λ max Y ). Taing into account that X has q eigenvalues equal to, λ min Y ) and λ max Y ) vanish as the values of d i, i =,..., q, increase, and, by A.), d i δi, we can conclude that for all ε > 0 there exists ξ > 0 such that, if δi > ξ for i =,..., q, then q eigenvalues λ j,..., λ jq, of D L A + )L T D satisfy λ j < ε... Incomplete factorizations and positive semidefinite matrices. Consider now the case where A is SPD and the seed preconditioner is an incomplete factorization of A. Thus, instead of A = LDL T we have A = LDL T + C, where C is a nonzero matrix. It is easy to verify that.0) can be extended as and inequality.4) becomes A + P ζ + C, λ A ζ + C ) A ζ + C ), provided that A ζ + C ) <. Therefore, when is small the quality of the preconditioner P depends on the size of C and the ability of clustering around the eigenvalues is lost if C is too large. On the other hand, when is small we cannot expect that the preconditioner is effective on A + if it is not on A. The situation is different when is large. Indeed, it is easy to verify that Theorems.3 and.4 still hold independently of the size of C. This is in agreement with the fact that, as all the diagonal entries of increase, the matrix A+ becomes more and more diagonally dominant and P tends to a diagonal preconditioner with diagonal entries increasing as fast as the diagonal entries of. Finally, we discuss the case where A is positive semidefinite. One possible approach is to compute, as a seed preconditioner, an incomplete LDL T factorization of the SPD matrix A + βi for some small positive β. We assume that A has ran r and the eigenvalues of A are ordered as follows 0 = λ A) = λ A) = = λ n r A) < λ n r+ A) λ n A). Note that, if the null space of A has a nontrivial intersection with the null space of, then A + is singular and P A + ) has n p null eigenvalues where p is the ran of A + and p r. In this case, as long as the system involving the matrix A+ is consistent and we solve it by using the CG method with zero starting guess, the null space of A + never enters the iteration, i.e., the corresponding zero eigenvalues do not affect the convergence and the rate of convergence depends on the ratio between the largest and the smallest positive eigenvalue [9, 5].

12 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI Suppose first that A + βi = LDL T. A straightforward extension of.0) holds:.) A + P ζ + β. Now let us analyse the eigenvalues of P A + ) when is small. To this aim, we prove the following theorem. Theorem.5. Let A + βi = LDL T and {P } UFP A, ). For all ε 0, β) there exists η > 0 independent of such that, if < η, then.3) β λ i A) + β ε ε < λ i P A + ) ) β < λ i A) + β + ε + ε, for i =,..., n. Proof. We observe that P A + ) = P A + + βi) βp. Furthermore, since P A + ) and P A + + βi) are similar to the symmetric matrices P / A + )P / and P / A + + βi)p /, respectively, by using [5, Theorem 8..5] we have.4).5) λ i βp λ i P ) + λmin P A + ) ) λ i βp A + + βi) ) λ i P A + ) ), ) + λmax P A + + βi) ), i =,..., n. By Theorem. we have that, for all ε > 0, if is sufficiently small then.6) λ max P A + + βi) ) < ε, λ min P A + + βi) ) < ε. To provide a bound on λ i βp ), we write P as P = A + βi + P A βi). By following the reasoning in the proof of Theorem., we find that there exists a constant ζ independent of such that and hence P A βi ζ, λ i P A βi) ζ, i =,..., n. Let ε < β; if is such that ζ < ε, by using the previous inequality and [5, Theorem 8..5] we get.7) λ i A) + β ε λ i P ) λ i A) + β + ε, i =,..., n; then, from.4).7), we can conclude that there exists η > 0 such that if < η then.3) holds. The previous theorem shows that, for sufficiently small values of ε and, the eigenvalues of P A + ) are clustered around if the corresponding eigenvalues of A are nonzero and β is small. When λ i A) = 0, we have 0 λ i P A + ) ) < c + ε, c 0,

13 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS if β ε, and 0 λ i P A + ) ) < c + ε, c, if β ε. Concerning Theorems.3 and.4, we can easily prove that they also hold when A + βi = LDL T. This is because, as the diagonal entries of increase, the effect of the perturbation βi on the preconditioner becomes negligible. Finally, we note that the results obtained with A+βI = LDL T can be extended to incomplete factorizations of A+βI. In this case, Theorems.3 and.4 still hold, while the bounds.) and.3) also include the error in the incomplete factorization, A + βi LDL T, similarly to the case where A is SPD. 3. Two practical preconditioners. In this section we present two specific sequences {P } belonging to the class UFP A, ), which have low-cost implementations. The first one is obtained as a simple extension of the preconditioning strategy studied by the authors in [] for sequences where = α I, α > 0. In [], given an incomplete LDL T factorization of A, a preconditioner for A + α I is defined as 3.) P = L + E + F )DL + E + F ) T, where E is a diagonal matrix and F is a strictly lower triangular matrix. The diagonal entries of E = diage,..., e n) have the following form e di + α i =, d i for i =,..., n, while the nonzero entries of F = fi,j ) are given by fi,j = γj l i,j, γj d j =, d j + α for i =,..., n, and j =,..., i. This preconditioner can be extended to the matrix A + by generalizing the definitions of e j and f i,j as follows: e d i + δi 3.) i =, d i 3.3) f i,j = γ j l i,j, γ j = dj d j + δ j It is easy to verify that the sequence {P } defined by 3.), 3.) and 3.3) belongs to UFP A, ). By using 3.) and 3.3), we can write L + E + F as. L + E + F = L + D D + ) I + off L)D D + ) off L) and this gives = I + off L)DD + ) ) D D + ), P = I + off L)DD + ) ) D + ) I + off L)DD + ) ) T.

14 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 3 Hence, P has the form.), with D and L such that 3.4) d i = d i + δ i, i =,..., n, and diagl ) =,,..., ) T, off L ) = off L)S, S = DD, for simplicity of notations we neglect the superscript from the factors L and D ). Trivially, conditions A.) A.3) are satisfied. Furthermore, from 3.4) it follows that D + D = 0, hence P fullfills property A.4) too. We note that the computational cost of building D is On) and hence the overall cost for the preconditioner P is generally low when compared with the cost of recomputing an incomplete LDL T factorization of A+. Furthermore, all the entries of L can be computed in parallel. The second preconditioner P is a new preconditioner. It has the form P = L D L T as for P, we neglect the superscript in D and L ), where the entries of D are defined as 3.5) i d i = d i + δi + l i,j ) d j s j ) d j ), j= and L is defined as in A.3). This choice of D has been suggested by equality.6), which holds if A = LDL T ; in this case, by defining d i as in 3.5) we get 3.6) diagp ) = diaga + ), i.e., we annihilate the diagonal of the error matrix A + P. Obviously, 3.6) is no longer true when A LDL T, but we can assume that the discrepancy between diagp ) and diaga+ ) is small if the seed preconditioner is a good approximation of A. Now we show that the sequence {P } belongs to UFP A, ), by proving that conditions A.) and A.) are satisfied. We proceed by induction. We first show that 3.7) 3.8) d i d i + δ i, d i s i ) d i 0, for i =,..., n. Trivially, for i = d = d + δ ; then d s ) d = d d ) d 0. For i >, suppose that 3.7) and 3.8) hold for all j < i. Then, inequalities 3.7) and 3.8) follow from 3.5). Therefore {P } satisfies A.). Concerning A.), we first show that, for i =,..., n, there exists σ i > 0 such that 3.9) d i d i σ i.

15 4 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS For i = inequality 3.9) trivially holds with σ i =. For i >, by assuming that 3.9) holds for all j < i and using 3.5), we have i i d i d i = δi + l i,j ) s j d j d j ) δi + l i,j ) d j d j ) j= j= i i δi + σ j l i,j ) + σ j l i,j ), j= and inequality 3.9) holds with σ i = + i j= l i,j σ j. Then 3.9) holds for i =,..., n and A.) is satisfied with σ = max i σ i. By using 3.5) and noting that d j s j ) d j = d j s j ) d j, we have and hence j= i d i d i δi l i,j ) d j, j= D D diagoff L)Doff L) T ). Then P also satisfies property A.4). Finally, from 3.5) we see that the computational cost of building the factor D of P is OnnzL)), as the cost of building L. Therefore, P is more expensive than P and its overhead decreases with the density of L. However, as shown in section 4, despite its higher computational cost, P is valuable as it provides a better approximation of the diagonal of the matrix A +. Remar 3.. If A = LDL T, the definition of d i in 3.5) is equivalent to 3.0) i d i = a i,i + δi li,j) d j, j= and the latter can be used to compute D. This is no longer true when A LDL T and the use of 3.0) could yield a breadown; in particular, d i might become negative or smaller than d i. 4. Numerical experiments. We analyzed the behaviour of the preconditioners P and P on sequences of linear systems arising in optimization. We considered two sets of sequences: the first one is made of sequences arising in the reflective Newton method for bound-constrained quadratic programming problems [0], implemented in the Matlab function quadprog; the second one consists of sequences of shifted linear systems arising in the Regularized Euclidean Residual RER) overestimation method for nonlinear least squares []. The updating strategies were compared with other preconditioning techniques. For the first test set we considered the diagonal and tridiagonal preconditioners provided by quadprog, computed from scratch for each system of the sequence. For the second test set the comparison was carried out with an incomplete factorization preconditioner recomputed for each matrix of the sequence and with the frozen seed preconditioner. More details are given in sections 4. and 4.. The comparisons were performed using the performance profiles proposed by Dolan and Moré [] and briefly

16 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 5 described next. Let S T, A 0 be a statistic corresponding to the successful solution of a test problem T by an algorithm A, and suppose that the smaller this value is, the better the algorithm is considered; furthermore, let S T be the smallest value attained on the test T by one of the algorithms under analysis, and χ M be a value such that χ M > S T, A /S T for all T and A. The performance profile of the algorithm A is defined as πχ) = number of tests such that S T, A/S T χ, χ, number of tests where the ratio S T, A /S T is set equal to χ M if the algorithm A fails in solving the test T. In other words, πχ) is the fraction of problems for which S T, A is within a factor χ of the smallest value S T. At χ = the performance profile gives the percentage of problems for which the algorithm A is the best, while the percentage of problems that are successfully solved by the algorithm A is lim χ χ πχ). Finally, we note M that when the values of S T, A /S T have different magnitudes, it may be convenient to use a logarithmic scale performance profile, i.e., π log χ) = number of tests such that log S T, A/S T ) χ, χ 0; number of tests here we use a base- logarithm. The experiments were run using Matlab R00b on an Intel Core Duo U9600 processor, with cloc frequency of.6 GHz, 3 GB of RAM and 3 GB of cache memory. The seed preconditioners were computed using the cholinc Matlab function, which implements the zero fill-in factorization as well as a factorization based on a drop tolerance. In our experiments cholinc was run using a drop tolerance as specified in the next subsections. The preconditioners P and P were implemented as Fortran 90 mex-files with Matlab interface. We note that we initially coded both preconditioners in Matlab, but we found that accessing the columns of L or the rows of L T ) one at a time, as required for the construction of P, produces a significant time overhead, independently of the sparsity of L. Conversely, P was efficiently implemented in Matlab, since the entries of its triangular factor can be updated all at once. We also observed that the times for building P were comparable for the Matlab and mex- Fortran 90 versions. Therefore, we decided to use mex-fortran 90 implementations for both preconditioners. In the experiments, the elapsed times were measured, in seconds, using the tic and toc Matlab commands. 4.. Preconditioner updates in quadprog. The function quadprog available in the Matlab Optimization Toolbox implements the reflective Newton method for bound-constrained quadratic programming problems: 4.) min y { qy) = yt Qy + c T y : } l y u, where Q R n n is symmetric, c R n, l {R { }} n, u {R { }} n, and l < u. In a preprocessing phase, quadprog shifts and scales the vectors l and u so that the finite values of l map to zero and the finite values of u map to. Hence, the quadratic programming problem becomes 4.) min x { qx) = xt GQGx + c T Gx : l x ū },

17 6 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS where G is a diagonal matrix with the ith diagonal entry equal to u i l i ) if < l i < u i <, and to otherwise. The procedure implemented in quadprog is an affine-scaling method that generates a strictly feasible sequence {x }. At each iteration, the solution of the linear system 4.3) M GQGM + D g )s = M gx ), is required, where gx ) = qx ) = GQGx + Gc, M is a positive definite diagonal matrix, and D g is a positive semidefinite diagonal matrix. The diagonal entries of M are given by ū i x m i )/ if gx )) i < 0 and ū i <, i,i = x i l i ) / if gx )) i > 0 and l i >, otherwise, and the diagonal entries of D g by { D g ) gx )) i,i = i if m i,i = ū i x i )/ or m i,i = x i l i ) /, 0 otherwise. If the quadratic problems are convex, then Q is positive semidefinite and M GQGM is positive semidefinite too. Note that, if the box [l, u] is bounded, then M. Henceforth, M GQGM + D g is denoted by H. The function quadprog solves the linear systems 4.3) by using the preconditioned CG procedure pcgr and supplies two ind of preconditioners computed from scratch for each system. The diagonal preconditioner P diag = diag H :, ),..., H :, n) ), where H :, j) denotes the jth column of H, is used by default. Alternatively, the user can fix a positive integer l and choose a banded preconditioner which is formed extracting from H the main diagonal and l lower and upper diagonals. If the Cholesy factorization of the banded matrix fails, a shift is applied and a new Cholesy factorization is attempted. In our experiments we chose l =, corresponding to a tridiagonal preconditioner, denoted by P trid in the following. To embed our updating techniques in quadprog we proceed as follows. If Q is SPD, we compute an incomplete Cholesy factorization of GQG using the cholinc function with drop tolerance 0. Otherwise, we compute an incomplete Cholesy factorization of the shifted SPD matrix GQG + βi, using the same dropping tolerance in our experiments β = 0 ). Trivially, for any, these factorizations provide an incomplete LDL T factorization of M GQGM, which is used as the seed preconditioner for the th matrix of the sequence. We remar that, as in the original quadprog implementation, we precondition the linear systems 4.3); equivalent systems with matrices GQG + M Dg are not preferable since GQG + M Dg may not be bounded in a neighbourhood of a nondegenerate point satisfying the second-order sufficiency conditions see [4, 0]). We run quadprog using the default tolerances for the stopping criteria of the nonlinear procedure, and the default bound cg itmax=n/ on the number of CG iterations. The solver pcgr was stopped when the -norm of the relative residual was less than a tolerance cg tol. In order to test the performance of the preconditioners, we used cg tol=0, 0 3, 0 5 ; the default value in quadprog is cg tol = 0.

18 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 7 P roblem n densq) densl) BIGGSB e-004.0e-003 CHENHARK e e-003 CVXBQP e e-003 JNLBRNG e e-003 JNLBRNG e e-003 JNLBRNGA e e-003 JNLBRNGB e e-003 NOBNDTOR e-004.9e-003 OBSTCLAE e e-003 OBSTCLAL* e e-003 OBSTCLBL* e e-003 OBSTCLBM e e-003 OBSTCLBU* e e-003 PENTDI e e-003 QUDLIN e e-004 TORSION* e-004.9e-003 TORSION e-004.9e-003 TORSION3* e-004.9e-003 TORSION e-004.9e-003 TORSION5* e-004.9e-003 TORSION e-004.9e-003 TORSIONA* e-004.9e-003 TORSIONB e-004.9e-003 TORSIONC* e-004.9e-003 TORSIOND e-004.9e-003 TORSIONE* e-004.9e-003 TORSIONF e-004.9e-003 Table 4. Bound-constrained convex quadratic programming problems from CUTEr. We compared the four preconditioners on a test set consisting of all the 8 boundconstrained convex quadratic programming problems of dimension n > 500 available in the CUTEr collection. The test problems are listed in Table 4. along with their dimension, the density of the matrix Q and the density of the factor L in the seed preconditioner, defined as densq) = nnzq)/n and densl) = nnzl)/nn+)/). These problems are equipped with an initial guess that in some cases is equal to the lower or the upper bound. Since quadprog requires a strictly feasible initial guess, we set it as x 0 = l + u l)/4 when the provided one is equal to the lower bound, and as x 0 = l + 3u l)/4 when the provided one is equal to the upper bound. Problems with modified initial guess are mared appending an asteris to the name. First, we briefly analyze the quality of the solutions of the optimization problems computed by quadprog with the four preconditioning strategies. Then, we focus on evaluating the performance of the preconditioners in the solution of the sequences of linear systems. At the convergence of the affine-scaling method, quadprog provides a measure of first-order optimality, namely the value of M gx ). We observe that the lowest values of this measure are generally obtained when using P and P. This is shown in Figure 4., where we plot the performance profiles of quadprog with the four preconditioners, in logarithmic scale, using as performance statistic the firstorder optimality measure. The performance profiles correspond to cg tol=0 3, but they are also representative of the runs performed with the other values of cg tol. The only exception is for the problem CHENHARK with cg tol = 0 and the

19 8 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS 0.8 π log χ) χ Fig. 4.. Performance profiles of P diag, P trid, P and P, for cg tol=0 3 : first-order optimality measure provided by quadprog. P diag P trid P P preconditioner P. In this case, quadprog stops after 00 nonlinear iterations the maximum number allowed) and the first-order optimality measure is greater than the optimality measure obtained by using the other three preconditioners. We consider this run as a failure of P. We found that the number of nonlinear iterations is fairly insensitive to the preconditioner used. Hence, the preconditioning strategies were applied to sequences of linear systems of comparable lengths. Figures display the performance profiles of the four preconditioners for the three values of cg tol, using as performance statistic both the number of CG iterations and execution time. The execution time is the time required to solve the whole sequence of linear systems arising from the application of quadprog, including the time devoted to either the construction or the update of the preconditioner. The number of CG iterations is the sum of the CG iterations required for each system. The performance profiles are plotted in the interval [0, 0] to better highlight the results of the comparison. We see that P and P outperform P diag and P trid in terms of both CG iterations and time. Furthermore, P and P appear much more efficient than the others when the accuracy requirement in the solution of the linear systems increases. We also note that P generally yields a smaller number of CG iterations than P ; hence a more accurate approximation of the diagonal of the matrix H seems to improve the quality of the preconditioner. Finally, the highest cost of P with respect to P is offset by the greatest effectiveness of P, so that P and P are comparable in terms of execution time. Detailed results of all the runs are reported in Appendix A along with further comments. 4.. Preconditioner updates in RER. The Regularized Euclidean Residual method is an overestimation method for nonlinear least-squares problems: min x R n F x), where F is countinuously differentiable. The step p between two successive iterates x and x + is chosen to approximately minimize the regularized Gauss-Newton model 4.4) mp) = F x ) + F x )p + µ p,

20 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI πχ) 0.4 P diag P trid πχ) 0.4 P diag P trid 0. P P 0. P P χ χ Fig. 4.. Performance profiles of P diag, P trid, P and P within quadprog, for cg tol=0 : number of CG iterations left) and execution time right) πχ) 0.4 P diag P trid πχ) 0.4 P diag P trid 0. P P 0. P P χ χ Fig Performance profiles of P diag, P trid, P and P within quadprog, for cg tol=0 3 : number of CG iterations left) and execution time right) πχ) 0.4 P diag P trid πχ) 0.4 P diag P trid 0. P P 0. P P χ χ Fig Performance profiles of P diag, P trid, P and P within quadprog, for cg tol=0 5 : number of CG iterations left) and execution time right).

21 0 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS where F is the Jacobian of F and µ is a positive parameter. Define the vector pν) so that 4.5) F x ) T F x ) + νi)pν) = F x ) T F x ); then, the minimizer of 4.4) is pν ), where ν is the positive root of the secular equation 4.6) ϕν) = µ F x ) + F x )pν). The Newton or secant method applied to 4.6) produces a sequence of scalars {ν r } and require the evalutation of ϕ at each iterate. This amounts to solve a sequence of shifted linear systems of the form 4.5) with ν = ν r. We tested the preconditioners P and P on sequences of shifted systems arising from the computation of some steps in the RER algorithm. In particular, we considered the eight sequences used in [], generated in the solution of nonlinear systems of equations of dimension 0 4. The nonlinear systems were obtained from the discretization of three PDE problems: the Bratu problem [4], a PDE problem modelling a flow in a porous medium [4], and the PDE problem given in [, 3..]. For details on the eight sequences of linear systems we refer to [, section 6.]. The systems of each sequence were solved by using the CG method implemented in the Matlab function pcg, with zero initial guess. The CG method was terminated when the ratio between the -norm of the current and the initial residual was less than 0 6, or when a maximum of 000 iterations was reached. In all the sequences the matrix F x ) T F x ) is positive definite; then, the seed preconditioner was obtained by performing an incomplete LDL T factorization of such matrix with the cholinc function. The drop tolerance was chosen as explained in [, Section 6.], so that CG coupled with the seed preconditioner succeeds in solving F x ) T F x )p = F x ) T F x ). For comparison purposes, we also considered the following preconditioning strategies: recomputing the incomplete LDL T factorization for each matrix of the sequence and freezing the incomplete LDL T factorization for all the values of ν r. The corresponding preconditioners are denoted by P rec and P frz, respectively. Figure 4.5 displays the performance profiles of the four preconditioning strategies, in terms of the number of CG iterations and of the execution time needed to solve the whole sequences. We note that only P rec and P are able to solve all the systems of each sequence. The preconditioner P fails in solving one system from one sequence, while the preconditioner P frz fails in solving all the systems of one sequence but the first, and all the systems of another sequence but the first two. As expected, the behaviour of the updating strategies in terms of CG iterations is between the behaviours of P rec and P frz. The situation is different if we consider the execution time, since P rec is outperformed by both the updating strategies because of the high cost of computing it. Finally, we note that P requires a smaller execution time than P for most of the problems; this means that the savings produced by P in terms of CG iterations are significant enough to compensate for its higher computational cost. 5. Conclusions. We have proposed a framewor for preconditioning sequences of positive semidefinite linear systems that differ by a diagonal matrix. For any given sequence, the preconditioners are obtained by updating a seed preconditioner, available in factorized form. The preconditioners in the framewor are effective on slowly varying sequences; furthermore, if they satisfy an additional requirement, they

22 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI πχ) 0.4 P rec πχ) 0.4 P rec 0. P frz P P 0. P frz P P χ χ Fig Performance profiles of P rec, P frz, P and P number of CG iterations left) and execution time right). on the sequences generated by RER: yield good spectral properties in the preconditioned matrices when the entries of become large. Two practical low-cost preconditioners belonging to the framewor have been presented and tested in the solution of sequences of linear systems arising from numerical optimization methods, showing the effectiveness of the proposed approach. Appendix A. Complete numerical results with quadprog. In Tables A.-A.6 we provide detailed results of the numerical experiments carried out with quadprog, varying the stopping tolerance cg tol of the pcgr solver. For the sae of readability, the index is dropped from the names P diag, P trid, P and P of the preconditioners. The headers of columns 3 6 have the following meaning: Itn: number of nonlinear iterations performed by quadprog. Itl: sum of the numbers of CG iterations performed at each nonlinear step. Tp: execution time, in seconds, to build the preconditioner. For the updating procedures it is the time required by the incomplete Cholesy factorization of GQG or GQG+βI) and by the subsequent updates; for P diag it is the overall time needed to build the diagonal preconditioner at each nonlinear iteration; finally, for P trid it is the overall time for extracting the tridiagonal band of H and performing its Cholesy factorization at each nonlinear iteration. Tt: execution time, in seconds, required to solve the whole sequence of linear systems. It is the sum of the times required to build/update the preconditioners and to solve the linear systems by CG. The tables show that our updating strategies are generally most effective than the other ones, as already observed in section 4.. A few runs deserve some comments. First, the sequence of linear systems arising when quadprog is applied to CHENNARK is hard to be solved for all the strategies. Second, the matrix Q in BIGGSB and QUDLIN is tridiagonal and therefore CG with P trid wors as a direct method small differences in the execution times as cg tol varies are due to the resolution of the tic and toc timer); this fact justifies the number of linear iterations performed. Interestingly, with BIGGSB using CG with either P trid or P requires the same number of iterations and effort in terms of computational time. Finally, we note that our updating strategies do not seem to be able to build efficient preconditioners for the sequence arising in the solution of CVXBQP.

23 PRECONDITIONING DIAGONALLY MODIFIED LINEAR SYSTEMS Comparison of P diag cg tol=0. Problem Prec Itn Itl Tp Tt BIGGSB P diag e-.5e P trid 7 7.5e- 4.5e- P 9 6.7e- 6.8e- P 7 7.3e- 4.4e- BQPGAUSS P diag 5 9.e- 7.3e- P trid e- 5.6e- P e-3.0e0 P e-.8e0 CHENHARK P diag e-.0e P trid e-.6e P e- 9.e- P e-.7e CVXBQP P diag 4 4.4e- 3.8e- P trid 4 4.7e- 3.6e- P 5 47.e0.8e0 P e0.5e0 JNLBRNG P diag e-.3e0 P trid e-.4e0 P e- 3.6e- P 7 47.e- 3.9e- JNLBRNG P diag 3 9.e- 8.e- P trid 53.6e- 8.9e- P e- 4.0e- P e- 3.8e- JNLBRNGA P diag e-.e0 P trid e-.e0 P e- 3.e- P e- 4.3e- JNLBRNGB P diag e- 7.5e- P trid 4 7.8e- 8.8e- P e- 3.4e- P 33.6e- 4.e- NOBNDTOR P diag e- 4.8e- P trid e- 3.e- P e-.7e- P 40.0e-.0e- OBSTCLAE P diag e-.e0 P trid e- 6.9e- P 9 63.e- 4.9e- P e- 4.4e- OBSTCLAL P diag e- 3.4e0 P trid 45.6e-.4e0 P 4 0.4e- 6.6e- P e- 7.e- OBSTCLBL P diag e- 5.5e- P trid 6 03.e- 9.e- P e- 5.e- P e- 6.e- OBSTCLBM P diag 0.e- 4.7e- P trid e- 4.7e- P e- 5.4e- P e- 6.e- OBSTCLBU P diag e- 4.8e- P trid 5 84.e- 3.9e- P 47.5e- 5.0e- P e- 5.0e-, P trid Table A., P and P in the solution of CUTEr problems by quadprog, with

24 S. BELLAVIA, V. DE SIMONE, D. DI SERAFINO, AND B. MORINI 3 Problem Prec Itn Itl Tp Tt PENTDI P diag e-.e- P trid e-.6e- P e-.e- P e-.5e- QUDLIN P diag 6 7.7e- 6.0e- P trid e- 7.e- P e- 8.e- P 3 3.8e- 8.e- TORSION P diag e- 3.4e- P trid 8 5.3e-.4e- P e-.7e- P e-.7e- TORSION P diag 8 3.6e-.9e- P trid 5.e-.6e- P 4 8.e-.7e- P e-.8e- TORSION3 P diag e-.9e- P trid 6 7.0e- 3.e- P e-.e- P 6 34.e-.e- TORSION4 P diag e- 3.e- P trid e-.4e- P e-.7e- P 6 3.e-.e- TORSION5 P diag e-.4e- P trid e-.e- P e-.9e- P e-.4e- TORSION6 P diag e-.9e- P trid e-.0e- P e-.9e- P 7 7.3e-.e- TORSIONA P diag e-.9e- P trid e-.8e- P e-.6e- P 3 3.e-.3e- TORSIONB P diag 7 4.7e- 3.8e- P trid e- 3.e- P e-.8e- P 3 33.e-.9e- TORSIONC P diag e- 3.e- P trid e-.6e- P e-.e- P 5 3.e-.0e- TORSIOND P diag e- 3.e- P trid e- 3.0e- P e-.9e- P 5 3.3e-.4e- TORSIONE P diag e-.6e- P trid e-.3e- P 6 3.0e-.0e- P 8 8.4e-.e- TORSIONF P diag e-.9e- P trid e-.8e- P e-.0e- P 5 8.3e-.0e- Table A. Comparison of P diag, P trid, P and P in the solution of CUTEr problems by quadprog, with cg tol=0 continued).

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek