Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek Gondzio, and Benedetta Morini, Bologna, 6th-7th Mar 2008
Outline Introduction 1 Introduction The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase 2 3 Iterative Linear Algebra The preconditioner Spectral properties PPCG 4
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase Bound Constrained Least-Squares Problems min l x u q(x) = 1 2 Ax b 2 2 + µ x 2 Vectors l (IR ) n and u (IR ) n are lower and upper bounds on the variables. A IR m n, b IR m, µ 0 are given and m n. We expect A to be large and sparse. We allow the solution x to be degenerate:. x i = l i or x i = u i, q i (x ) = 0, for some i, 1 i n
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase We limit the presentation to NNLS problems: min x 0 q(x) = 1 2 Ax b 2 2 We assume A has full column rank there is a unique solution x. Let g(x) = q(x) = A T (Ax b) and D(x) be the diagonal matrix with entries: { xi if g d i (x) = i (x) 0 1 otherwise The core of our procedure is an Inexact Newton-like method applied to the First Order Optimality condition for NNLS: D(x)g(x) = 0
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase Inexact Newton Interior Point methods for D(x)g(x) = 0 [Bellavia, Macconi, Morini, NLAA, 2006] The method uses ideas of [Heinkenschloss, Ulbrich, Ulbrich, Math. Progr., 1999] Let E(x) be the diagonal positive semidefinite matrix with entries: { gi (x) if 0 g e i (x) = i (x) < xi 2 or g i (x) 2 > x i 0 otherwise. Let W (x) and S(x) be the diagonal matrices W (x) = (E(x) + D(x)) 1 S(x) = (W (x)d(x)) 1 2 Note that (S(x)) 2 i,i (0, 1] and (W (x)e(x)) i,i [0, 1).
k-th iteration Introduction The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase Solve the s.p.d. system: Z k p k = S k g k + r k, r k η k W k D k g k where η k [0, 1) and Z k Z(x k ) is given by: Z k = Sk T (AT A + D 1 k E k)s k = Sk T AT AS k + W k E k Form the step p k = S k p k Project it onto an interior of the positive orthant: ˆp k = max{σ, 1 P(x k + p k ) x k } (P(x k + p k ) x k ), where σ (0, 1) is close to one.
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase Globalization Phase Set: x k+1 = x k + (1 t)ˆp k + tp C k t [0, 1) where pk C is a constrained Cauchy step. t is chosen to guarantee a sufficient decrease of the objective function q(x). Strictly positive iterates Eventually t = 0 is taken up to quadratic convergence can be obtained without assuming strict complementarity at x.
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase The Linear Algebra Phase: normal equations The system Z k p k = S k g k represents the normal equations for the least-squares problem min p IR B δ p + h n with B δ = ( ASk W 1 2 k E 1 2 k ) ( Axk b, h = 0 ).
The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase The Linear Algebra Phase: augmented system The step p k can be obtained solving: ( ) ( ) ( I ASk qk (Axk b) S k A T = W k E k p k 0 }{{} H δ ) Note that W k E k is positive semidefinite and where 1 > δ = min i (w k e k ) i. v T W k E k v δv T v, v IR n,
Conditioning issues The problem The Inexact Interior Point Framework Focus on the Linear Algebra Phase Let 0 < σ 1 σ 2... σ n, be the singular values of AS k Assume σ 1 << 1. If δ = 0 then κ 2 (H 0 ) 1 + σ n σ 2 1 κ 2 (B 0 ) 1 + σ n σ 1, i.e. κ 2 (H 0 ) may be much greater than κ 2 (B 0 ). If δ > 0 (regularized system), then κ 2 (H δ ) 1 + σ n δ κ 2 (B δ ) 1 + σ n δ, i.e. If δ > σ 1 : κ 2 (H δ ) (κ 2 (B δ )) may be considerably smaller than κ 2 (H 0 ) (κ 2 (B 0 ))
The Regularized I.P. Newton-like method If σ 1 is not small, the regularization does not deteriorate κ 2 (H δ ) with respect to κ 2 (H 0 ). Clear benefit from regularization ( see also [Saunders, BIT, 1995][Silvester and Wathen, SINUM, 1994] ) Modification of the Affine Scaling I.P. method: where Z k p k = S k g k + r k Z k = Sk T (AT A + D 1 k = Sk T AT AS k + W k E k } {{ } Z k and k is diagonal with entries in [0, 1). E k + k )S k 2 + k Sk
Fast convergence of the method is preserved (in presence of degeneracy, too) The globalization strategy of [BMM] can be applied with slight modifications. The least square problem and the augmented system are regularized: ( ) ASk B δ = where C 1 2 k ( ) I ASk H δ = S k A T C k C k = W k E k + k S 2 k
Features of the method Let τ (0, 1) be a small positive threshold and I k = {i {1, 2,..., n}, s.t. (s 2 k ) i 1 τ}, A k = {1, 2,..., n}/i k, n 1 = card(i k ), then S k = diag((s k ) I, (S k ) A ) Note that S 2 k + W ke k = I. When x k converges to x, lim k (S k ) I = I, lim k (S k ) A = 0. lim k (W k E k ) I = 0, lim k (W k E k ) A = I. I k is expected to eventually settle down (inactive components and possibly degenerate components)
Solving the augmented system The preconditioner Spectral properties PPCG The following partition on the augmented system is induced: I A I (S k ) I A A (S k ) A (S k ) I A T I (C k ) I 0 q k ( p k ) I = (Ax k b) 0 (S k ) A A T A 0 (C k ) A ( p k ) A 0 Eliminating ( p k ) A we get ( I + Qk A I (S k ) I } (S k ) I A T I (C k ) I {{ } H k H k IR (m+n 1) (m+n 1 ) ) ( qk ( p k ) I ) ( (Axk b) = 0 )
The Preconditioner The preconditioner Spectral properties PPCG Note that ( H k = I A I (S k ) I ) } (S k ) I A T I ( k Sk 2) I {{ } P k ( Qk 0 + 0 (W k E k ) I ) where Q k = A A (S k C 1 k S k ) A A T A When x k converges to x, (S k ) A 0, (C k ) A I, then lim (Q k) = 0, k lim (W ke k ) I = 0. k
The preconditioner Spectral properties PPCG Factorization of the Preconditioner ( P k = I A I (S k ) I ) (S k ) I A T I ( k Sk 2) I can be factorized as ( ) ( ) ( I 0 I AI I 0 P k = 0 (S k ) I A T I ( k ) I 0 (S k ) I }{{} Π k ) If I k and k remain unchanged for a few iterations, the factorization of matrix Π k does not have to be updated. I k is expected to eventually settle down.
Eigenvalues Introduction The preconditioner Spectral properties PPCG P 1 k H k has at least m n + n 1 eigenvalues at 1 the other eigenvalues are positive and of the form λ = 1 + µ, µ = ut Q k u + v T (W k E k ) I v u T u + v T ( k S 2 k ) I v, where (u T, v T ) T is an eigenvector associated to λ. if µ is small: the eigenvalues of P 1 k H k are clustered around one. This is the case when x k is close to the solution.
Eigenvalues (x k far away from x ) The preconditioner Spectral properties PPCG The eigenvalues of P 1 k H k have the form λ = 1 + µ and If ( k ) i,i = δ > 0 for i I k, If ( k ) i,i = µ A A(S k ) A 2 τ + τ δ(1 τ) { (wk ) i (e k ) i for i I k and (w k ) i (e k ) i 0 δ > 0 for i I k and (w k ) i (e k ) i = 0 µ A A(S k ) A 2 τ + 1 1 τ Better distribution of the eigenvalues. A scaling of A at the beginning of the process is advisable.
The preconditioner Spectral properties PPCG Solving the augmented system by PPCG We can adopt the Projected Preconditioned Conjugate-Gradient (PPCG) [Gould, 1999], [Dollar, Gould, Schilders, Wathen, SIMAX, 2006] It is a CG procedure for solving indefinite systems: ( ) ( ) ( ) H A p g A T = C q 0 with H IR m m symmetric, C IR n n (n m) symmetric, A IR m n full rank, using preconditioners of the form: ( ) G A A T T with G symmetric, T nonsingular.
The preconditioner Spectral properties PPCG When C is nonsingular, PPCG is equivalent to applying PCG to the system (H + AC 1 A T )p = g with preconditioner: G + AT 1 A T In our case, it is equivalent to applying PCG to the system: (I + Q k + A I (S k C 1 k S k ) I A T I ) q k = (Ax k b), }{{} F k using ther preconditioner G k = I + A I ( k ) 1 I AT I,
Eigenvalues of G 1 k F k The preconditioner Spectral properties PPCG. If ( k ) i,i = (w k ) i (e k ) i for i I k, then the eigenvalues of F k satisfy: G 1 k 1 1 2 τ λ 1 + A A(S k ) A 2. τ Drawback: Differently from the previous results, no cluster of eigenvalues at 1 is guaranteed Advantage: PPCG is characterized by a minimization property and requires a fixed amount of work per iteration
Implementation issues Dynamic regularization: ( k ) i,i = { 0, if i Ik (i.e.(w k ) i (e k ) i > τ) min{ max{10 3, (w k ) i (e k ) i }, 10 2 }, otherwise. Iterative solver: PPCG with adaptive choice of the tolerance in the stopping criterion. Linear systems are solved with accuracy that increases as the solution is approached. PPCG is stopped when the preconditioned residual drops below tol = max(10 7, η k W k D k g k A T S k 1 ).
To avoid preconditioner factorizations: at iteration k + 1 freeze the set I k and the matrix k if #(IT PPCG) k 30 & card(i k+1 ) card(i k ) 10. If I k is empty (i.e. S k 1 τ): we apply PCG to the normal system (S T k A T AS k + C k ) p k = S k A T (Ax k b).
Matlab code, ɛ m = 2. 10 16. The threshold τ is set to 0.1 Initial guess x 0 = (1,..., 1) T. Succesfull termination: q k 1 q k < ɛ (1 + q k 1 ), x k x k 1 2 ɛ ( 1 + x k 2 ) P(x k g k ) x k 2 < ɛ 1 3 ( 1 + g k 2 ) or with ɛ = 10 9. P(x k g k ) x k 2 ɛ A failure is declared after 100 iterations.
Test Problems The matrix A is the transpose of the matrices in the LPnetlib subset of The University of Florida Sparse Matrix Collection. We discarded the matrices with m < 1000 and the matrices that are not full rank. A total of 56 matrices. Dimensions ranges up to 10 5 The vector b is set equal to b = A(1, 1,..., 1) T When A 1 > 10 3, we scaled the matrix using a simple row and column scaling scheme.
Numerical Results On a total of 56 test problems we succesfully solve 51 tests: 41 test problems are solved with less than 20 nonlinear iterations. In 40 tests the average number of PPCG iterations does not exceed 40. In 8 tests the solution is the null vector. At each iteration I k =, S T k AT AS k + C k I and the convergence of the linear solver is very fast.
Savings in the number of preconditioner factorizations
Percent Reduction in the dimension n We solve augmented system of reduced dimension m + n 1
Future work More experimentation, using also QMR and GMRES Develop a code for the more general problem: min l x u q(x) = 1 2 Ax b 2 2 + µ x 2 If µ > 0: A may also be rank deficient the augmented systems are regularized naturally Comparison with existing codes (e.g. BCLS (Fiedlander), PDCO (Saunders))