Incomplete oblique projections for solving large inconsistent linear systems

Similar documents
Bounds on the Largest Singular Value of a Matrix and the Convergence of Simultaneous and Block-Iterative Algorithms for Sparse Linear Systems

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

September 2, Revised: May 29, 2007

Review of Linear Algebra

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Conjugate Gradient (CG) Method

CS 246 Review of Linear Algebra 01/17/19

Convergence analysis for column-action methods in image reconstruction

Superiorized Inversion of the Radon Transform

EE731 Lecture Notes: Matrix Computations for Signal Processing

Chapter 7 Iterative Techniques in Matrix Algebra

ELE/MCE 503 Linear Algebra Facts Fall 2018

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

Numerical optimization

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Lecture 2: Linear Algebra Review

Absolute value equations

Chapter 3 Transformations

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

3 (Maths) Linear Algebra

Numerical Methods I Non-Square and Sparse Linear Systems

Error estimates for the Raviart-Thomas interpolation under the maximum angle condition

Lecture 5 : Projections

USE OF THE INTERIOR-POINT METHOD FOR CORRECTING AND SOLVING INCONSISTENT LINEAR INEQUALITY SYSTEMS

Weaker hypotheses for the general projection algorithm with corrections

Conditions for Robust Principal Component Analysis

Alternating Minimization and Alternating Projection Algorithms: A Tutorial

Iterative Methods for Solving A x = b

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Notes on basis changes and matrix diagonalization

15 Singular Value Decomposition

A convergence result for an Outer Approximation Scheme

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear programming: theory, algorithms and applications

A Note on Inverse Iteration

Knowledge Discovery and Data Mining 1 (VO) ( )

(1) for all (2) for all and all

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Set, functions and Euclidean space. Seungjin Han

The Newton Bracketing method for the minimization of convex functions subject to affine constraints

Copositive Plus Matrices

Interior-Point Methods for Linear Optimization

A derivative-free nonmonotone line search and its application to the spectral residual method

Principal Component Analysis

Linear Systems. Carlo Tomasi

Relation of Pure Minimum Cost Flow Model to Linear Programming

5 Handling Constraints

(v, w) = arccos( < v, w >

Course Notes: Week 1

Introduction to Compressed Sensing

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

MAT 610: Numerical Linear Algebra. James V. Lambers

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

G1110 & 852G1 Numerical Linear Algebra

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

Linear Algebra Massoud Malek

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

9. The determinant. Notation: Also: A matrix, det(a) = A IR determinant of A. Calculation in the special cases n = 2 and n = 3:

14 Singular Value Decomposition

b 1 b 2.. b = b m A = [a 1,a 2,...,a n ] where a 1,j a 2,j a j = a m,j Let A R m n and x 1 x 2 x = x n

Optimality Conditions for Constrained Optimization

Properties of Matrices and Operations on Matrices

Least Squares Optimization

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

(v, w) = arccos( < v, w >

文件中找不不到关系 ID 为 rid3 的图像部件. Review of LINEAR ALGEBRA I. TA: Yujia Xie CSE 6740 Computational Data Analysis. Georgia Institute of Technology

(v, w) = arccos( < v, w >

RESEARCH ARTICLE. A strategy of finding an initial active set for inequality constrained quadratic programming problems

Robust l 1 and l Solutions of Linear Inequalities

AMS526: Numerical Analysis I (Numerical Linear Algebra)

7. Symmetric Matrices and Quadratic Forms

Journal of Symbolic Computation. On the Berlekamp/Massey algorithm and counting singular Hankel matrices over a finite field

Variable Selection for Highly Correlated Predictors

MATH 2331 Linear Algebra. Section 1.1 Systems of Linear Equations. Finding the solution to a set of two equations in two variables: Example 1: Solve:

Stochastic Design Criteria in Linear Models

Lecture 1 Systems of Linear Equations and Matrices

Chapter Two Elements of Linear Algebra

Methods for sparse analysis of high-dimensional data, II

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

2 Two-Point Boundary Value Problems

Vector bundles in Algebraic Geometry Enrique Arrondo. 1. The notion of vector bundle

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Conceptual Questions for Review

A strongly polynomial algorithm for linear systems having a binary solution

Math 302 Outcome Statements Winter 2013

MATH36001 Perron Frobenius Theory 2015

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

Review of Basic Concepts in Linear Algebra

Markov Decision Processes

Example Linear Algebra Competency Test

Transcription:

Math. Program., Ser. B (2008) 111:273 300 DOI 10.1007/s10107-006-0066-4 FULL LENGTH PAPER Incomplete oblique projections for solving large inconsistent linear systems H. D. Scolnik N. Echebest M. T. Guardarucci M. C. Vacchino Received: 15 September 2004 / Accepted: 21 September 2005 / Published online: 17 January 2007 Springer-Verlag 2007 Abstract The authors introduced in previously published papers acceleration schemes for Projected Aggregation Methods (PAM), aiming at solving consistent linear systems of equalities and inequalities. They have used the basic idea of forcing each iterate to belong to the aggregate hyperplane generated in the previous iteration. That scheme has been applied to a variety of projection algorithms for solving systems of linear equalities or inequalities, proving that the acceleration technique can be successfully used for consistent problems. The aim of this paper is to extend the applicability of those schemes to the inconsistent case, employing incomplete projections onto the set of solutions of the augmented system Ax r = b. These extended algorithms converge to the least squares solution. For that purpose, oblique projections are used and, in particular, variable oblique incomplete projections are introduced. They are defined by means of matrices that penalize the norm of the residuals very strongly in the first iterations, decreasing their influence with the iteration counter in order to fulfill the convergence conditions. The theoretical properties of the new Dedicated to Clovis Gonzaga on the occassion of his 60th birthday. H. D. Scolnik (B) Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Pabellón I, Ciudad Universitaria, C1428EGA Buenos Aires, Argentina e-mail: scolnik@fibertel.com.ar H. D. Scolnik e-mail: hugo@dc.uba.ar N. Echebest M. T. Guardarucci M. C. Vacchino Departamento de Matemática, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Argentina e-mail: opti@mate.unlp.edu.ar

274 H. D. Scolnik et al. algorithms are analyzed, and numerical experiences are presented comparing their performance with several well-known projection methods. Keywords Projected Aggregation Methods Incomplete projections Inconsistent system Mathematics Subject Classification (2000) 90C25 90C30 1 Introduction Large and sparse systems of linear equations arise in many important applications [7, 12], as image reconstruction from projections, radiation therapy treatments planning, other image processing problems, computational mechanics, optimization, etc. Problems of image reconstruction from projections, after a suitable discretization [6], can be represented by a system of linear equations Ax = b, (1) where A is an m n matrix, m is the number of equations, n the size of the (unknown) image vector and b is the vector of readings. In practice, those systems are often inconsistent, and one usually seeks a point x R n that minimizes a certain proximity function. A common approach to such problems is to use algorithms, see for instance Censor and Zenios [5], which employ projections onto convex sets in various ways, using either sequential or simultaneous projections onto the hyperplanes represented by the rows of the complete system or onto blocks of a given partition. A widely used algorithm in Computarized Tomography is algebraic reconstruction technique (ART), whose origin goes back to Kaczmarz [15], although it is known that in order to get convergence it is necessary to use an underrelaxation parameter that must tend to zero. Among the simultaneous projection algorithms for inconsistent problems we can mention Cimmino, SART [5], Landweber [16], CAV [6] and the more general class of the diagonal weighting (DWE) algorithms [4]. In order to prove convergence of this sort of methods in the inconsistent case it is necessary to choose the relaxation parameter in an interval that depends on the largest eigenvalue of A T A [2]. Recently a block-iterative scheme based on Landweber s method, that not only encompass the above mentioned algorithms, but also uses a less restrictive condition for proving convergence was published [14]. The Projected Aggregation Methods (PAM), introduced by Householder and Bauer [13], are more efficient than those of the Cimmino or Kaczmarz type [5, 10], when the problems are consistent. Within this context we have developed acceleration schemes [9, 19 21] for these sort of algorithms based on projecting the search directions onto the aggregated hyperplanes, with excellent results. In this paper, we extend the above mentioned algorithms in order to compute the weighted least squares solution of inconsistent systems. We derive two

Incomplete oblique projections 275 new algorithms that use a scheme of incomplete oblique projections onto the solution set of the augmented system Ax r = b, for minimizing the proximity function and for converging to a weighted least squares solution of the system (1). More explicitly, in order to solve a possibly inconsistent system Ax = b, A R m n, b R m, m > n, we consider the standard problem: min x R n b Ax 2 D m, (2) where. Dm denotes the norm induced by the positive definite diagonal matrix D m R m m, whose solutions coincide with those of the problem A T D m Ax = A T D m b. We define two convex sets in the (n + m)-dimensional space R n+m, denoting by [u; v] the vertical concatenation of u R n,withv R m, P ={p : p = [x; r] R n+m, x R n, r R m, Ax r = b}, and Q ={q : q =[x;0] R n+m, x R n,0 R m }, adopting the distance d(p, q) = p q D, for all p P, q Q. D is a diagonal matrix of order n + m, whose n first elements are 1 s, and the last m coincide with those of D m. By means of a direct application of the Karush Kuhn Tucker (KKT) conditions [5] to the problem min { p q 2 D : for all p P, for all q Q}, (3) we obtain that its solutions p P, p =[x ; r ], and q =[y ;0] Q, satisfy x = y, A T D m r = 0, and p q 2 D = r 2 D m. Therefore this problem is equivalent to problem (2). This observation led us to develop a new method for solving (2), applying an alternate projections scheme between the sets P and Q, similar to the one of Csiszár and Tusnády [8], but replacing the computation of the exact projections onto P by suitable incomplete or approximate projections, according to the following basic scheme Algorithm 1 (Basic Algorithm) Iterative step : Given p k =[x k ; r k ] P, q k =[x k ;0] Q, find p k+1 a =[x k+1 ; r k+1 ] P as: p k+1 a arg min{ p q k 2 D : p P}, then define p k+1 = p k+1 a, and q k+1 Q by means of q k+1 =[x k+1 ;0] arg min{ p k+1 q 2 D : q Q}. In order to compute the incomplete projections onto P we apply our ACIMM algorithm as presented in [19], and later on named ACCIM in [21] and the subsequent papers. This iterative algorithm uses simultaneous projections onto the hyperplanes of the constraints of Ax r = b, and is very efficient for solving consistent problems and convenient for computing approximate oblique projections with some required properties.

276 H. D. Scolnik et al. In the following sections we will present two algorithms based on the same basic scheme and will prove their convergence. One keeps constant the initial matrix D and its induced norm, while the other is a dynamic version in which the matrix of weights D m changes with the iteration counter. The entries of that matrix are large at the beginning and decrease progressively until reaching convergence. The matrices defining the oblique projections arise from the sparsity pattern of the problem. For the second algorithm they also serve the purpose of penalizing the residuals in each iteration. At the beginning a strong penalization leads to a substantial decrease of the norm of some components of the residual and later on smaller values allow to converge to a desired solution. The new algorithms can be easily parallelized and implemented for dealing with blocks of constraints. The algorithms are compared with ART [5], and CAV [6], using a problem of image reconstruction from projections. The implementations were done within SNARK93 [1], a software package for testing and evaluating algorithms for image reconstruction problems, using the most efficient values for the relaxation parameters as reported in [6]. We also present other numerical experiences for comparing the performance of the new algorithms with those of Landweber [16], Double ART [2] and the Kaczmarz Extended [18]. The paper is organized as follows: in Sect. 2 we briefly review some notation and results necessary for describing the new algorithms. In Sect. 3 the new oblique projection IOP and IVOP algorithms are presented, together with some related results needed for defining them. The convergence theory is in Appendix B. In Sect. 4 some numerical experiences are described, together with some preliminary conclusions. In Appendix A we give some properties of the ACCIM algorithm that were not published before, needed for understanding its application to the new algorithms and for proving their convergence. 2 Projections algorithms Let us assume we have a system Ax = b, A R m n, m n, b R n.ifitis compatible, we will denote by x any solution of it. From hereafter x denotes the Euclidean norm of x R n, and x D the norm induced by a positive definite matrix D. We will also assume that each row of A has a Euclidean norm equal to 1. We use the notation e i for the ith column of I n, where the symbol I n denotes the identity matrix in R n n. We will use the upper index T for the transpose of a matrix. Given M R n r we denote by m i the ith column of M T, and by R(M) the subspace spanned by the columns of M, P M and PM D the orthogonal and the oblique projectors onto R(M), whose formulas under the assumption that rank(m) = r are P M = M(M T M) 1 M T and PM D = M(MT DM) 1 M T D, respectively. We will use the notation R(M) for the D-orthogonal subspace to R(M), and by P D the corresponding projector. In M particular, if M =[v] R n 1, we will use P D. We adopt the convention that if v v = 0 R n, P D = I v n. We denote a diagonal matrix of order m,byd = diag(d), where d = (d 1,..., d m ).

Incomplete oblique projections 277 For each constraint of Ax = b, we denote by L i ={x R n : a T i x = b i}, r i (x) = a T i x b i, and the oblique projection of x onto L i by P D i (x) = x r i(x) a T i D 1 a i D 1 a i. (4) An oblique version of ACCIM [21], a parallel simultaneous projection algorithm for solving consistent problems, using weights w i 0, is described by: Algorithm 2 (ACCIM) Initialization: Given x 0, v = 0; set k 0. Iterative Step: Given x k and the direction v, Compute d k = m i=1 w i (Pi D (x k ) x k ), and ˆd k = P D d k, and x k+1 = x k + v λ ˆdk k, being λ k as given by v ˆd k ; k k + 1. mi=1 w i Pi D (x k ) x k 2 D λ k =. (5) ˆd k 2 D Remark 1 The convergence properties of this algorithm were published in [19] and other related results in [21]. In particular, the value of (5) corresponds to the solution of the problem min λ x k + λˆd k x 2 D. (6) The necessary condition of optimality for this problem is also sufficient because of its convexity, and therefore we have that in every iteration the following property holds (x k + λ ˆdk k x ) T Dˆd k = 0. (7) Furthermore, it follows immediately from the definition of λ k, that the product λ ˆdk k, is independent of the length considered in d k. Therefore, this justifies to use for the w i only the condition w i 0, without m i=1 w i = 1. In Appendix A, we will present some properties of ACCIM required for analyzing the new algorithms. For the moment, we just point out that the sequence {x k } generated by ACCIM, from the initial point x 0, converges to the solution x of Ax = b, satisfying x = arg min { x x 0 2 D, x R n : Ax = b }. (8) 3 Inconsistent case As said in the Sect. 1, for possibly inconsistent systems Ax = b, A R m n, b R m, m > n, we will consider the standard problem (2), whose solution

278 H. D. Scolnik et al. coincides with the one of the problem: A T D m Ax = A T D m b. (9) When rank(a) = n and the diagonal matrix of weights D m is positive definite, we could extend directly the ACCIM algorithm to the inconsistent case if in the computation of λ k,thed-norm is replaced by the A T D m A-norm considering λ k = arg min λ 0 xk + λˆd k x 2 A T D m A, (10) defining d k like in ACCIM, and ˆd k = P v (d k ),withv = A T D m Aˆd k 1,an approach that implies a high computational cost. In this paper we present algorithms that are particular cases of the Algorithm 1. 3.1 Incomplete oblique projections algorithm We consider a diagonal weighting matrix D R (n+m) (n+m), being D = ( ) In 0, (11) 0 D m and the sets P and Q formerly introduced. Given p k P, and its projection q k onto Q, we denote by p D min (qk ) the projection of q k onto P, which is the solution of the problem min{ p q k D : p P}. (12) In particular, if p =[x ; r ] P,withx solution of (9), and its projection onto Q, q =[x ;0], we get p = p D min (q ), and the minimum distance between the sets P and Q is r 2 D m. In the new algorithms, given q k Q instead of defining p k+1 = p D min (qk ), we define p k+1 = p k+1 a, where p k+1 a P is a point obtained by means of the incomplete resolution of the problem (12). Remark 2 The results in [3] allow us to prove that the sequence given by the Algorithm 1 is convergent when p k+1 = p D min (qk ). It is interesting to note that the iterative step in regard to x involving the exact oblique projection taking into account the KKT-conditions for (12) as 1 minimize 2 [x; r] [xk ;0] 2 D subject to Ax r = b,

Incomplete oblique projections 279 can be written as x D min = xk + A T D m (b Ax D min ) if pd min (qk ) = [ x D min ; rd min]. (13) Thus, for non-singular matrices A T D m A, the basic scheme with exact projections as in Csiszár and Tusnády [8] generates a sequence that, in relation to the solution x of (9), satisfies: x D min x = (I n + A T D m A) 1 (x k x ). (14) From (14) it follows that the exact scheme always converges but its convergence rate depends on the choice of the matrix D m. For instance, if we could choose a matrix D m leading to an upper bound much less than one for the norm (I n +A T D m A) 1 = 1 1+λ min (A T D m A), where λ min stands for the least eigenvalue of a symmetric positive definite matrix, then we would get very fast convergence. The idea to study is to select D m in such a way that λ min (A T D m A)>λ min (A T A). Since A T D m A is positive definite, and u T A T D m Au, for all u R n, u =1, then multiplying and dividing by Au 2, we get u T A T D m Au λ min (D m )λ min (A T A), and therefore λ min (A T D m A) λ min (D m )λ min (A T A). This means that when λ min (A T A) 1, it would be enough to define a diagonal matrix D m with entries greater than one, while if λ min (A T A)<<1weneed that λ min (D m ) be of the order of 1 λ min (A T A). Then, it would be very helpful to be able to obtain estimations of the least eigenvalue of symmetric positive definite matrices for getting higher rates of convergence of this sort of iterative methods. On the other hand, the Eq. (13) is algebraically similar to the one that iterates of the class of Landweber s methods satisfy [2]: and in regard to x, x k+1 = x k + γ A T D m (b Ax k ), (15) x k+1 x = (I n γ A T D m A)(x k x ). (16) In order to prove convergence [2,14] it is necessary that γ (0, 2/L], L being the largest eigenvalue of A T D m A. This condition requires a good estimation of L. In[2] there are interesting results in regard to the estimation of L using the maximum number of non-zero elements in the columns of A. As mentioned before the algorithm based on exact projections is always convergent but its computational cost is high. Therefore we will present the theory of inexact projections aiming at getting similar convergence properties but with a much lower computational cost. In order to define the inexact projection pa k+1 p D min (qk ), we consider the following:

280 H. D. Scolnik et al. Definition 1 Given an approximation ˆp =[z; µ], z R n, µ R m,ofp D min (qk ), we denote by P(ˆp) =[z; µ + s], the solution of the system Ax r = b that satisfies µ + s = Az b. Aiming at obtaining properties of the sequence {p k } generated by the new algorithm that guarantees convergence to the solution of (3) we establish an acceptance condition that an approximation ˆp =[z; µ] of p D min (qk ) must satisfy. Definition 2 (Acceptance condition) An approximation ˆp =[z; µ] of p D min (qk ) is acceptable if it satisfies that ˆp q k 2 D pd min (qk ) q k 2 D and ˆp P(ˆp) 2 D γ ˆp pk 2 D with 0 <γ <1. (17) Remark 3 In particular, ˆp = p D min (qk ) satisfies (17). In order to see that it is feasible to find an approximation ˆp of p D min (qk ),by an iterative algorithm, satisfying (17) it is necessary to consider its convergence properties. We will use the ACCIM algorithm and, taking into account its properties, we will prove that it is possible to find a j > 0, such that [z j ; µ j ] satisfies (17). Lemma 1 Given p k =[x k ; r k ],q k =[x k ;0]. If the sequence {[z j ; µ j ]} is generated by ACCIM algorithm, with [z 0 ; µ 0 ]=q k, then (i) If p D min (qk ) = p k, an index ĵ exists such that [zĵ; µĵ] satisfies (17). (ii) If p D min (qk ) = p k, then the unique solution of (17) isp k. Proof Assuming that p k p D min (qk ) D = 0, by property (iii) of the Lemma 2 in Appendix A, every approximation ˆp satisfies the Fejér monotonicity condition ˆp q k 2 D pd min (qk ) q k 2 D. In order to see that it also fulfills the remaining condition (17), we will assume that the approximation ˆp =[z j ; µ j ] satisfies ˆp P(ˆp) 2 D >γ ˆp pk 2 D for all j > 0. (18) Due to the properties of ACCIM, ˆp goes to p D min (qk ). If we denote s j = Az j µ j b, we know that s j Dm goes to 0. From the definition of P(ˆp) we get that ˆp P(ˆp) 2 D = sj 2 D m, and since p D min (qk ) is the solution of (12) we know that p D min (qk ) q k is D-orthogonal to p p D min (qk ), for all p P. Furthermore, due to Lemma 2 of Appendix A, ˆp q k is D-orthogonal to p ˆp, for all p P, then we get that ˆp satisfies ˆp p k 2 D = ˆp pd min (qk ) 2 D + pd min (qk ) p k 2 D. Therefore, assuming the inequality (18) holds it leads to s j 2 D m >γ p D min (qk ) p k 2 D,for all j > 0. Since s j Dm tends to zero, we get p D min (qk ) p k 2 D = 0. Therefore, because of the hypothesis p D min (qk ) p k 2 D = 0, an index ĵ must exists for which ˆp =[zĵ; µĵ], satisfies the condition (17). When p D min (qk ) = p k, as a consequence of (iii) of Lemma 2 in Appendix A, we get that ˆp p k 2 D ˆp P(ˆp) 2 D, for all ˆp =[zj ; µ j ]. Therefore, the unique vector satisfying (17) isˆp = p D min (qk ) = p k.

Incomplete oblique projections 281 In order to introduce the alternate incomplete projections algorithm, we define the approximation p k+1 a of p D min (qk ),by p k+1 a = P(ˆp), if ˆp =[z j ; µ j ] satisfies (17). (19) Then, once p k+1 =[x k+1 ; r k+1 ]=P(ˆp) is computed, the second step of the new projection algorithm defines q k+1 =[x k+1 ;0] Q. We describe in the following a practical algorithm, that uses the ACCIM version given in Appendix A. Algorithm 3 Incomplete oblique projections (IOP) Initialization: Given 0 <γ <1, a positive definite diagonal matrix D m of order m and p 0 =[x 0 ; r 0 ],setr 0 = Ax 0 b, q 0 =[x 0 ;0] Qand k 0. Iterative Step: Given p k =[x k ; r k ],setq k =[x k ;0]. Calculate ˆp, approximation of p D min (qk ) satisfying (17), applying ACCIM as follows: define y 0 =[z 0 ; µ 0 ]=q k the initial point. For solving Ax r = b, iterate until finding y j = [z j ; µ j ], such that s j = Az j µ j b satisfies (17), that is s j 2 D m γ ( r k 2 D m S j ), withsj = y j y 0 2 D. Define p k+1 =[x k+1 ; r k+1 ], being x k+1 = z j, and r k+1 = µ j + s j. Define q k+1 =[x k+1 ;0] Q. k k + 1. Remark 4 The inequality used in the algorithm to accept [z j ; µ j ], s j 2 D m γ( r k 2 D m S j ), withs j = y j y 0 2 D, (20) arises from replacing in the inequality of (17), ˆp P(ˆp) 2 D = sj 2 D m, and p k ˆp 2 D = (pk q k ) (ˆp q k ) 2. Then by (iii) of the Lemma 2 in Appendix A, we know that p k ˆp and ˆp q k are D-orthogonal, using the fact that p k q k 2 D = [0; rk ] 2 D = rk 2 D m, we get that p k ˆp 2 D = rk 2 D m ˆp q k 2 D. Moreover, and using again the results of the Lemma 2 in Appendix A, ˆp q k 2 D = yj y 0 2 D. The convergence results are given in Lemmas 3 and 4 of Appendix B. 3.2 Incomplete oblique projections using penalization parameters The equality (14) suggests that if the matrix D m penalizes each violated constraint the solution of the problem (2) will be obtained more rapidly. In particular, we can see that if a diagonal matrix D m = β 0 C m, β 0 > 0 is considered, with

282 H. D. Scolnik et al. C m = diag(c), c i > 0, i = 1,..., m, applying the formula (13) from an initial point y 0, the exact solution to the problem (12), we will have the expression x D min = (I n + β 0 A T C m A) 1 (y 0 + β 0 A T C m b) (21) which coincides with ( ) 1 1 ( ) 1 x D min = I n + A T C m A y 0 + A T C m b. (22) β 0 β 0 When β 0 > 0 is large enough, we see that such solution is an approximation to the solution of the problem A T C m Ax = A T C m b. In particular, if D m = I m + β 0 C m we get ( ( ) ) 1 1 1 ( ( ) ) 1 1 x D min = I n + A T I m + C m A y 0 + A T I m + C m b. β 0 β 0 β 0 β 0 (23) Likewise, as in (22), if β 0 is large enough such a solution is an approximation to the solution of A T C m Ax = A T C m b. In the special case when C m = diag(e i ),if we penalize with the constant β 0, we would get almost a zero residual for the i- equation because the solution of [ ( ) ] ( ) 1 1 I n + A T I m + C m A x Dmin β 0 β = 1β0 y 0 + A T I m + C m b, (24) 0 1β0 almost satisfies a i a T i xd min = a ib i, and a i (a T i xd min b i) 0. With the purpose of getting a substantial reduction of the components of the residual corresponding to the solution of problem (12), we replace the matrix D m by a sequence of penalization matrices D k m. Intuitively speaking, we proposetouselargeelementsinthematrixd 0 m for penalizing some or all of the constraints according to given criteria or a previous knowledge of the problem. Later on the penalization should become progressively weaker in order to get convergence to a desirable solution. In Algorithm 3 we replace the D matrix defined in (11), by ( ) D k In 0 = 0 D k m (25) in each iteration, where D k m = I m + diag(α1 k,..., αk m ). For each i = 1,..., m, αi k > 0, is a penalty parameter corresponding to the ith equation. We consider a decreasing sequence {αi k} k, such that lim k αi k = 0, for i = 1,..., m. We define for all p P and q Q, d(p, q) = p q D k.

Incomplete oblique projections 283 The new algorithm, using variable oblique projections induced by D k,isthe following: Algorithm 4 Incomplete variable oblique projections (IVOP) Initialization: Given 0 <γ <1, a positive definite diagonal matrix D 0 m of order m and p 0 =[x 0 ; r 0 ],setr 0 = Ax 0 b, q 0 =[x 0 ;0] Qand k 0. Iterative Step: Given p k =[x k ; r k ] and D k m,setqk =[x k ;0]. Calculate ˆp, an approximation of p Dk min (qk ) satisfying (17), applying the version of ACCIM with oblique projections induced by the D k matrix as follows: Define y 0 =[z 0 ; µ 0 ]=q k the initial point. For solving Az µ = b, iterate until finding y j =[z j ; µ j ] such that s j = Az j µ j b satisfies (17), that is s j 2 γ( r k 2 S D k m D k j ), withs j = y j y 0 2. m D k Define p k+1 =[x k+1 ; r k+1 ], being x k+1 = z j and r k+1 = µ j + s j. Define q k+1 =[x k+1 ;0] Q. Update D k+1 m ; k k + 1. It is interesting to point out the difference between the solution of the problem (12) using the norm induced by D k and that obtained with the Euclidean norm in an algorithmic scheme as the one here proposed. We denote by. α k the vector norm induced by the matrix diag(α k ) defined in (25). Remark 5 Given q k =[x k ;0] Q, if[x min ; r min ] is the solution of the problem (12) withd m = I m, and [x Dk min ; rdk min ] is the solution of the similar problem with D m = D k m, then we obtain the following. By definition of [x min; r min ], for all [x; r] P, satisfies that x min x k 2 + r min 2 x x k 2 + r 2, then in particular we have that x min x k 2 + r min 2 x Dk min xk 2 + rmin Dk 2. On the other hand, [x Dk min ; rdk min ] satisfies xdk min xk 2 + rmin Dk 2 x D k min x k 2 + r min 2 m Dm. k Therefore, considering that rmin Dk 2 = r D k min Dk 2 + rmin Dk 2, and replacing in m α k the former expression we get, x Dk min xk 2 + rmin Dk 2 + rmin Dk 2 x α k min x k 2 + r min 2 + r min 2. Hence, r Dk α k min 2 < r α k min 2. Then, we observe that α k α k i ((rmin Dk i ) 2 (r mini ) 2 )<0, and this shows that a subset of components of the residual rmin Dk exists whose magnitudes are significantly less than those of r min, and such that its sum is greater than the sum of the remaining components. The results on the convergence of the IVOP Algorithm are given in Lemmas 3 and 4 of Appendix B. 4 Numerical experiences The objectives of the following numerical experiences are twofold. The first case aims to illustrate the behavior of our algorithms in the solution of image

284 H. D. Scolnik et al. reconstruction problems. In this case, aiming at displaying the quality of the reconstructed image using different methods, we compare our results with those obtained with ART [5], and CAV [6]. Those algorithms have been used with the same purpose in recent papers. The second case aims to compare our algorithms (Subsect. 4.1.1) with other methods, reporting the number of iterations needed for satisfying the convergence conditions and the corresponding CPU time. Here we used the following methods: Landweber (LANW) [16], with relaxation parameter, Double ART(DART) [2], and Kaczmarz extended (KE) in [18], that are simultaneous and sequential projection methods. All methods were implemented sequentially and the experiences were run on a PC AMD Sempron 2.6, with 224 MB RAM. In the implementations of IOP and IVOP we consider: 1. The parameter γ = 10 2 in the initial iteration (k = 0), then γ = 0.5 (k > 0). 2. The matrix D m : in IOP equal to I m, and in IVOP D k m = I m + diag(α k ), with αi k = max(1, r0 i )+ a i ni, being n 2 k i the number of nonzero elements of the ith row of A. 3. The stopping condition: If r k+1 Dm r k Dm < 10 4 max( r k Dm,1) The matrix D k m used in the implementation of IVOP was the one that gave the best performance using SNARK93 problems in regard to the descent of the norm of the residual, among several alternatives. They arose from the idea of penalizing in a stronger way those constraints having a higher average of initial residuals ((ri 0)2 /n i ) or those having a larger average length a i 2 /n i.forthe class of SNARK93 problems each a ij represents the length of the ith ray within the jth cell. This means that the higher the previous quotient, the more meaningful is the information provided by that ray in regard to the cells it intersects. Therefore, the corresponding equation should have a higher weight. 4.1 Experimental results on an image reconstruction problem from projections We compared our algorithms using an image reconstruction problem from projections. All methods were implemented sequentially using SNARK93 [1] which is composed by a number of modules written in FORTRAN 77 whose aim is to test new algorithms for image reconstruction problems. The SnarkDisplay module allows to visualize both the phantoms (objects to be reconstructed) as well as the obtained reconstructions in each iteration. Another useful tool is Snark- Eval (C++) that graphically presents the curves of the evaluated parameters and compare the results obtained with different algorithms. We show the performance of the algorithms on the reconstruction of the Herman head phantom [11], defined by a set of ellipses, with a specific attenuation value attached to each original elliptical regions. As a consequence of the discretization of the image reconstruction problem by X-rays transmission, a Cartesian grid of image square elements called pixels is defined in the region of interest in such a way that it covers the totality of the image to be reconstructed.

Incomplete oblique projections 285 It is assumed that the attenuation function of the X-rays takes a constant value x j in the jth pixel for j = 1, 2,..., n. It is also assumed that the length of the intersection of the ith ray with the jth pixel, denoted by a ij for all i = 1, 2,..., m, j = 1, 2,..., n, represents the weights of the contribution of the jth pixel to the total attenuation of energy along the ith ray. The physical measurement of the total attenuation along the ith ray, denoted by b i, represents the line integral of the unknown function along the ray s path. Therefore, in the discrete model, the line integral turns out to be a finite sum, and the model is given by the linear system Ax = b. This system is very large because n (number of pixels) and m (number of rays) are each of the order 10 5 in order to get high resolution images. The A matrix is highly sparse because less than 1% of the entries are different from zero, because only a few pixels have a non-empty intersection with a particular ray. The systems (see Censor et al. [6] for a more complete description) are generally overdetermined and inconsistent, because in the fully discretized model each line integral of the attenuation along the ith ray is approximated by a finite sum. With a suitable selection of the variables that define the geometry of the data [1] such as angles between rays, the distances between rays when they are parallel, the localization of the sources, etc, the generated problems have full rank matrices. 4.1.1 Comparisons Different test cases, using the Herman head phantom (Fig. 1), were run. Those tests are the same appearing in [6], that arise from considering a different number of projections and number of rays per projection, leading in such a way to systems with a different number of variables and equations, as shown in Table 1. The performance of IOP and IVOP algorithms are compared with that of CAV2, with relaxation parameter λ k = 2, and ART with λ k = 0.1. This Fig. 1 Herman head phantom

286 H. D. Scolnik et al. Table 1 Test cases m n Image size Projections Rays Case 2 26425 13225 115 115 151 175 Case 3 126655 119025 345 345 365 347 Case 4 232275 119025 345 345 475 489 Fig. 2 Relative error for case 2 particular choice of the parameters is the one reported by Censor et al. [6] as those which gave the best results. It is necessary to point out that every iteration of the IOP and IVOP algorithms usually requires more than one inner iteration of ACCIM which in turn demands to use all the equations of the system, as the other algorithms do. In general we can observe that IOP and IVOP need few inner iterations at the beginning, while that number increases when becoming close to the solution. All experiments used x 0 = 0. We show graphs that compare the four algorithms in cases 2 4. In Figs. 2, 3 and 4 the curves representing the relative error of the density with regard to the phantom s density, for each algorithm and up to the 20th iteration are shown. In the following, we will explain the meaning of this variable. Let S denote the set of pairs (i, j) such that the pixel in the ith row and jth column is in the region of interest phantom. We use ρij k to denote the density

Incomplete oblique projections 287 Fig. 3 Relative error for case 3 assigned to the pixel in the ith row and jth column of the reconstruction after k iterations, if k 1, and the density in the corresponding pixel in the phantom if k = 0. We denote by n = N 2 the total numbers of pixels of the grid, and by ρ 0 = ij S ρ0 ij, the total density of the phantom. The relative error is R k = (i,j) S ρk ij ρ0 ij We can observe a very efficient behavior of IVOP because in a few iterations it reaches the minimum error with regard to the seeked image within the first 20 iterations. IVOP got the least error image in the third outer iteration (17 inner iterations) for cases 3 and 4 and in the second outer iteration (7 inner iterations) for case 2. In Figs. 5, 6 and 7 we compare the quality of the reconstructed image by each algorithm when they reach the image with least relative error, among the first 20 iterations, in regard to the seek image. The reconstructed image given by IVOP and IOP in the case 3 is better than those obtained by the other algorithms. In case 4 with a finer grid, the quality is similar to the one obtained with ART. The CAV2 algorithm requires a higher number of iterations for achieving that quality. ρ 0

288 H. D. Scolnik et al. Fig. 4 Relative error for case 4 In Table 2 we give the CPU times required by each algorithm for reaching the least relative error solution. IOP and IVOP use more CPU time than ART in this sequential implementation although the quality of the reconstructed image is better in case 3. Such a difference must vanish if the parallel properties of IVOP and IOP are exploited. For IOP and IVOP we give number of inner iterations/number of outer iterations. 4.2 Numerical results (second case) Here we compare our algorithms with the LANW method using the relaxation parameter γ = 2/L, where L stands for an estimation of the maximum eigenvalue of A T A given in [2], and the Double ART Algorithm (DART) [2]. The stopping criteria for the Landweber s algorithm is the same than the one for IVOP. The stopping criteria for the two processes involved in DART were the magnitude of the residuals of the respective consistent systems A T w = 0 and Ax = b w f, where w f is the obtained solution for the first one. If Rnk is the residual in the kth iteration, the algorithm stops: If Rnk < 10 4, for solving A T w = 0, and If Rnk < 10 4,forAx = b w f. The test problems used for these numerical experiences are some of the generated by SNARK93.

Incomplete oblique projections 289 Fig. 5 Case 2: comparison of the least relative errors for the reconstructed image In Table 3 the dimensions of the test problems are shown using m for the number of rows, n for the number of columns and nnul for the number of nonzero elements in the generated matrix. The results obtained with DART were extremely sensitive to the precision required for computing the solution w f of A T w = 0. Due to this reason the problems were solved with the KE Algorithm [18] which, given w k 1 and x k 1, computes w k by performing a complete Kaczmarz cycle for A T w = 0, and adapts x k by making a similar complete cycle in Ax = b w k, restarting at x k 1. The initial point for solving A T w = 0isw 0 = b. In this case, the used stopping criteria is: If Ax k b + w k < 10 4. This is less strict than the one used by the author in [18], but we preferred this choice because it is enough for our comparisons. In Table 4 we show for each algorithm the number of iterations, the final residual, and the CPU time in seconds required for reaching convergence. For

290 H. D. Scolnik et al. Fig. 6 Case 3: comparison of the least relative errors for the reconstructed image the methods IOP and IVOP we also included the total number of inner iterations/outer iterations. For DART we reported the number of iterations needed for reaching the termination criteria for each one of the systems. It is possible to observe that the LANW algorithm leads to a final residual greater than those of the other algorithms, something that seems to be related to its convergence rate and to the fact the stopping criteria is not too tight. Anyhow we do believe this is enough for analyzing the algorithm s behavior. The final residuals are of the same order for all tested algorithms, but the number of iterations may vary drastically if one seeks for a lower feasible residual. In Table 4 it is possible to observe that both IOP and IVOP need a few outer iterations to converge with a residual similar to those of the other methods. The CPU time required for obtaining those residuals is less than the ones insumed by the other algorithms. Every inner iteration of the new algorithms is equivalent to a cycle through all equations of the system as LANW does. On the contrary,

Incomplete oblique projections 291 Fig. 7 Case 4: comparison of the least relative errors for the reconstructed image Table 2 CPU time required for reaching the least relative error Method Least relative error (iter/20) CPU time (s) Case 2 ART 6 3.4 CAV2 20 20.5 IVOP 7/2 5.9 IOP 17/8 8.8 Case 3 ART 10 56.6 CAV2 20 244.3 IVOP 17/3 130.0 IOP 23/8 149.2 Case 4 ART 10 99.3 CAV2 20 433.5 IVOP 17/3 239.1 IOP 23/8 257.8

292 H. D. Scolnik et al. Table 3 Numerical tests Problem m n nnul B1 3254 9614 99598 B2 1650 961 51342 B7 27376 9025 2808137 B8 1 79578 13225 1282239 B8 2 26425 13225 2564640 Table 4 Numerical tests Problem/method IOP IVOP LANW DART KE B1 Iter 63/11 67/9 181 81 + 52 121 Rnk 7.00 7.00 7.05 7.00 7.00 CPU 0.33 0.38 1.10 0.99 1.92 B2 Iter 215/58 201/52 410 369 + 444 968 Rnk 5.02 5.02 5.16 5.01 5.01 CPU 0.55 0.60 1.48 3.29 8.13 B7 Iter 184/24 174/13 261 10900 + 52 116 Rnk 67.60 67.53 69.93 67.51 67.52 CPU 24.28 24.33 35.65 3018.54 49.75 B8 1 Iter 142/14 174/15 373 254 + 384 848 Rnk 3.58 3.58 3.64 3.58 3.57 CPU 78.71 103.59 211.46 520.17 445.03 B8 2 Iter 558/87 605/93 604 2037 + 412 804 Rnk 1.45 1.45 1.59 1.44 1.44 CPU 85.74 94.97 94.75 542.38 343.52 each iteration of KE requires two Kaczmarz cycles. This explains why in the Table 4 it uses fewer iterations but the CPU time is greater than the one used by IOP and IVOP. DART very often uses many iterations for solving A T w = 0, but just a few for Ax = b w f. This conflict arises from the termination criteria used for solving A T w = 0. We believe that KE is very efficient for solving small problems but not for larger ones. Aiming at exhibiting the behavior of the different algorithms in regard to the reduction of the norm of the residuals in the first iterations, we compare in Table 5 their magnitudes after a prescribed number of iterations (Iter p ). After knowing the number of iterations required by IOP and IVOP for completing the first, second and third iteration, we took the maximum of both as Iter p.in this table we excluded DART because before solving the compatible system Ax = b w f it had to solve A T w = 0, performing the necessary number of iterations until getting convergence. Therefore, the iterative procedures cannot be compared. We denote by Rn 0 the Euclidean norm of the initial residual. From the comparisons in Table 5 we see that IVOP minimizes more rapidly the norm of the residual in the first outer iterations than any other algorithm, although it uses a greater number of inner iterations that IOP. Also KE needs the double of cycles per iteration, and therefore for the same number of iterations the corresponding CPU time practically duplicates those of IOP and IVOP.

Incomplete oblique projections 293 Table 5 Comparing the descent of the residual s norms Problem/method IOP IVOP Iter p LANW KE B1 13.50/6 11.59/8 8 19.11/8 12.26/8 Rn 0 = 128.20 8.46/11 8.40/14 14 13.54/14 8.31/14 7.50/17 7.26/20 20 11.07/20 7.28/20 B2 20.77/3 16.87/4 4 30.60/4 144.38/4 Rn 0 = 218.2 10.51/6 9.13/7 7 18.36/7 55.72/7 7.46/10 7.07/11 11 13.47/11 26.43/11 B7 92.28/9 81.40/12 12 89.17/12 324.50/12 Rn 0 = 772.56 76.18/17 74.05/20 20 82.79/20 153.89/20 69.85/27 70.03/29 29 79.53/29 82.13/29 B8 1 39.26/3 28.60/4 4 46.57/4 1322/26 Rn 0 = 435.54 26.51/4 16.95/8 8 27.27/8 891.98/8 16.02/7 11.75/10 10 32.22/10 829.33/10 B8 2 19.37/3 14.16/4 4 24.71/4 326.25/4 Rn 0 = 232.01 11.65/6 8.93/7 7 16.03/7 229.31/7 6.83/8 5.92/10 10 12.29/10 176.13/10 As a conclusion we can say the new algorithms give very good results when compared with the other ones, and they are also very useful for image reconstruction problems because they minimized the norm of the residual very rapidly in a few iterations. 5 Conclusions The general acceleration scheme within the PAM framework, chosen for dealing with inconsistent systems presented in this paper, turned out to be very efficient when applied to different projection algorithms. The ACCIM algorithm used for computing IOP, can be easily implemented in parallel and can also be extended for considering blocks of equalities in a similar way as it was done for the ACIPA algorithm in [9] for linear inequalities. That extension will be presented in a forthcoming paper, together with the results of its application to other problems as well as several alternatives for choosing the matrices D m in more general contexts. The practical consequences of the new technique are far reaching for realworld reconstruction problems in which iterative algorithms are used, and in other fields where there is a need to solve large and unstructured systems of linear equations. Acknowledgments To the unknown referees for their very interesting comments and suggestions. A Appendix A.1 Properties of the ACCIM algorithm First, we will point out some properties of the ACCIM algorithm which is the basis for computing approximate solutions of the problem (12) in every

294 H. D. Scolnik et al. iteration of the Algorithms 3 and 4. Those results are needed for proving the convergence of the new algorithms. Given a consistent system Āy = b, where Ā R m n, we denote by y to any solution of it. Let {y j } be the sequence generated by a version of the ACCIM algorithm with a D-norm, we denote r j = Āy j b the residual at each iterate y j. The direction d j defined in the Algorithm 2 by the equality (4)is d j = m ( w l P D l (y j ) y j) m = l=1 l=1 w l r j l ā l 2 D 1 D 1 ā l. (26) From condition (7) we know that at each iterate y j = y, the direction ˆ d j = P v (d j ), where v = ˆd j 1, satisfies ( ˆ d j ) T D(y j y + λ j ˆ d j ) = 0, (27) and therefore the next iterate y j+1 = y j + λ ˆ jd j satisfies ( d ˆj ) T D(y j+1 y ) = 0. From the definition of d ˆj we know that d ˆj = d j (ˆd j 1 ) T Dd ˆj ˆd j 1 ; then multiplying d ˆj by D(y y j ), considering that by (27) (y y j ) T Dˆd j 1 = 0, we get ˆd j 1 2 D that ( d ˆj ) T D(y y j ) coincides with (d j ) T D(y y j ). Also, using the formula of d ˆj, and multiplying it by Dd ˆj it follows that ( d ˆj ) T Dˆd j coincides with d j Dˆd j since d ˆj is D-orthogonal to ˆd j 1. Thus, using these equalities in (27) we obtain that (d j ) T D(y y j ) λ j (d j ) T Dˆd j = 0. (28) Therefore, the next iterate y j+1 = y j +λ j ˆd j is such that y j+1 y is D-orthogonal to both directions d j and d ˆj. Lemma 2 If y j = y,j> 0, is generated by the ACCIM algorithm, then (i) d j is D-orthogonal to y y i, for all i < j. (ii) ˆd j is D-orthogonal to ˆd i and d i, for all i < j. Furthermore, (iii) y y j is D-orthogonal to ˆd i, for all i < j, and as a consequence is also D-orthogonal to y j y 0. Proof For j = 1, from (27) we know that (d 0 ) T D(y 1 y ) = 0, where y 1 = y 0 + λ 0 d 0. Then, using the definition of d 1 = m 1 w i D 1 ri ā 1 i, we see that ā i 2 D 1 (d 1 ) T D(y y 0 ) coincides with (d 0 ) T D(y 1 y ) = 0, hence follows (i). Also, by definition of ˆd 1, (ii) holds. Assuming that (i) and (ii) hold for a given j, j 1, we shall prove that they are also valid for j + 1. Thus, we assume that d j is D-orthogonal to y y i, for all i < j, and ˆd j is D-orthogonal to ˆd i and d i, for all i < j. Aty j+1 = y j + λ j ˆd j, the direction is d j+1 = m l=1 w l D 1 r j+1 l ā l, and ā l 2 D 1

Incomplete oblique projections 295 considering that r j+1 = r j + Ā(y j+1 y j ), we get d j+1 = d j m w l D 1 ā l ā T l ā l 2 (y j+1 y j ). (29) D 1 l=1 Multiplying (29) byd(y y j ) we obtain by the definition of y j+1,bythe obvious equality r j l = ā T l (y j y ) and by (26) and (28) that (d j+1 ) T D(y y j ) coincides with (d j ) T D(y y j ) m (y l=1 w j y ) T DD 1 ā l ā T l (y j+1 y j ) l, which is equal ā l 2 D 1 r j l to (d j ) T D(y y j )+( m l=1 w l ā ā l 2 l ) T λ j (ˆd j ) and to (d j ) T D(y y j ) λ j (ˆd j ) T D 1 Dd j = 0. Thus, d j+1 is D-orthogonal to y y j. Moreover, multiplying (29) by D(y y i ),fori < j, due to the inductive hypothesis we know that (d j ) T D(y y i ) = 0 and (d i ) T Dˆd j = 0, and therefore d j+1 is D-orthogonal to y y i, for i < j. Hence, we get that d j+1 is D-orthogonal to y y i,fori < j + 1. By definition ˆd j+1 = d j+1 (d j+1 ) T Dˆd j ) ˆd j, which means that ˆd j+1 is D-orthogonal ˆd j 2 D to ˆd j. Furthermore, since (i) we know that d j+1 is D-orthogonal to y y i for i < j + 1, then it is also D-orthogonal to y i y i 1,fori < j + 1. Then, considering that y i y i 1 = λ ˆdi 1 i 1, it follows that d j+1 is D-orthogonal to ˆd i 1,for i < j + 1. Then, using the inductive hypothesis, ˆd j is D-orthogonal to ˆd i, for all i < j. The conclusion is that ˆd j+1 is D-orthogonal to ˆd i for all i < j + 1. Furthermore, since ˆd i = d i (di ) T Dˆd i 1 ˆd i 1, for all i < j + 1, using that ˆd j+1 ˆd i 1 2 D is D-orthogonal to ˆd i and ˆd i 1, we also get that ˆd j+1 is D-orthogonal to d i,for all i < j + 1. Finally, to see (iii) we use the fact that as a consequence of (27) ˆd i is D-orthogonal to y i+1 y, for all i < j. On the other hand, using (ii) we know that ˆd i is also D-orthogonal to ˆd s,fori < s < j, then ˆd i is also D-orthogonal to y j y i+1. Hence, for all i < j, ˆd i is D-orthogonal to y j y. Therefore, y j y 0 = j 1 i=0 λ i ˆd i is D-orthogonal to y j y. A.2 Applying ACCIM to solve Ax r = b We will consider in the following the application of the ACCIM algorithm for solving Az I m µ = b, required in every step of Algorithms 3 and 4. Given p k and q k =[x k ;0], k 0, ACCIM computes an approximation [z j ; µ j ] to the projection p Dk min (qk ). In Algorithm 3 the matrix D k = D is constant [see (11)], while in Algorithm 4 the matrix is variable [see (25)]. Given q k = [x k ;0] Q, in order to solve (12), from the starting point [z 0 ; µ 0 ] = q k = [x k ;0], at each [z j ; µ j ], we denote by s j the residual s j =

296 H. D. Scolnik et al. Az j µ j b. The direction d j R n+m, can be written as d j = m i=1 w i s j i ā i 2 D k 1 (D k ) 1 ā i, (30) being ā i =[a i ; e i ] the i-column of [A, I m ] T, where e i denotes the i-column of I m. The square of the norm of each row of the matrix [A, I m ], induced by the inverse of D k,is 1+ 1/(D k m ) i,if a i =1 and (D k m ) i denotes the ith element of the diagonal of D k m. Hence, the direction d j =[d j 1 ; d j 2 ] has its first n components given by: and the next m-components are d j 1 = m i=1 d j 2 = (Dk m ) 1 w i (D k m ) i s i j 1 + (D k m ) a i (31) i m i=1 w i (D k m ) i s i j 1 + (D k m ) e i. (32) i We choose for this application of ACCIM the values w i : w i = 1 + (D k m ) i, because they privilege the rows with greatest 1 + (D k m ) i. Thus, at each iterate [z j ; µ j ], the directions are: d j m 1 = (D k m ) i a isi j = A T D k m ( s j ) and d j m 2 = e i si j = s j. (33) i=1 This direction d j satisfies that d j 1 = AT D k m ( d j 2 ). Similar behavior has the direction d ˆj =[ˆd j 1 ; ˆd j 2 ], where ˆd j 1 = AT D k m ( ˆd j 2 ), expression that follows from considering that ˆd j 1 = d j 1 (ˆd j 1 ) T D k d j j 1 ˆd ˆd j 1 2 1, and ˆd j 2 = d j 2 (ˆd j 1 ) T D k d j j 1 ˆd D k ˆd j 1 2 2, D k and using that ˆ d0 coincides with d 0. Hence, for each j > 0, [z j ; µ j ] satisfies z j z 0 = A T D k m ( µ j ). Remark 6 We will describe now the properties of the accepted approximation ˆp =[z j ; µ j ], in the iterations of the Algorithms 3 and 4. (i) From (iii) of the previous Lemma, we know that for every solution [z ; µ ] of Az µ = b, [z ; µ ] [z j ; µ j ] is D k -orthogonal to [z j ; µ j ] [z 0, µ 0 ]. Thus, for all p P, we have that (p ˆp) T D k (ˆp q k ) = 0. (ii) From (i) of the same Lemma we know that for every [z j ; µ j ], j > 0, the direction d j satisfies d jt D k ([z ; µ ] [z 0 ; µ 0 ]) = 0. Then, since p k P and q k =[z 0 ; µ 0 ], in particular d jt D k (p k q k ) = 0. Hence, using the expression of d j in (33), and considering that p k q k =[0; r k ], we get that s j is D k m -orthogonal to rk. i=1

Incomplete oblique projections 297 B Convergence of the algorithms Aiming at proving that the sequences {p k } generated by the two new algorithms are convergent we will first describe the properties shared by both. Since we will study the relationship between two consecutive iterates p k and p k+1, depending of the D-norm in Algorithm 3, and of the variable D k -norm in Algorithm 4, we will use the more general notation D k for both. We will distinguish in relation to both algorithms the following sets: L D m sq the set of solutions to the problem (2), L D m sq = { x R n : forwhich r = Ax b satisfies A T D m r = 0 }, (34) and the corresponding S D m = { p : p =[x ; r ] P such that x L D } m sq, (35) D m being the matrix of (11). In particular, D m = I m corresponds to the set of solutions of the classical least squares problem. For the sequence given by Algorithm 3 we will prove that converges to an element of the set S D m,withd m the matrix arising from (11), while for the sequence given by Algorithm 4 we will prove convergence to an element of the set of solutions of the classical least squares problem (D m = I m ). Given an initial point q 0 =[x 0 ;0] Q for both algorithms we consider x = arg and denote p =[ x ; r ] S D m, which satisfies p = arg because q 0 p 2 D = x0 x 2 + r 2 D m. min x 0 x 2 (36) x L Dm sq min q 0 p 2 p S Dm D, (37) Lemma 3 Let {p k }={[x k ; r k ]} be the sequence generated by either the Algorithm 3 or 4, then (i) p k =[x k ; r k ] and p k+1 =[x k+1 ; r k+1 ] satisfy r k+1 2 D k m r k 2 (1 D k m γ) p k p Dk min (qk ) 2. D k (ii) The sequence { r k D k m } is decreasing and bounded, therefore it converges. (iii) (iv) (v) The following three sequences tend to zero: { p k p Dk min (qk ) 2 }, D k { p k+1 p Dk min (qk ) 2 }, and { p k+1 p k 2 }. D k D k The sequence { A T D k m rk } goes to zero. If x = arg min x L Dm x x k 2, then x also fulfills that property in regard sq to x k+1.

298 H. D. Scolnik et al. Proof We get, as a consequence of the definition of p k+1, and the corresponding q k+1 that r k+1 2 p k+1 q k 2. Furthermore, due to the ACCIM D k m D k properties in Lemma 2 (iii) of Appendix A, p k+1 q k 2 = p k+1 ˆp 2 + D k D k ˆp q k 2. From the condition (17) for accepting ˆp, p k+1 ˆp 2 γ ˆp D k D k p k 2. Then, p k+1 q k 2 γ ˆp p k 2 + ˆp q k 2. Therefore, as a D k D k D k D k consequence of the Lemma 2 (iii) of Appendix A, ˆp q k 2 = r k 2 D k D k m ˆp p k 2, then we get that q k p k+1 2 γ ˆp p k 2 + r k 2 D k D k D k D k m ˆp p k 2. Hence, q k p k+1 2 r k 2 (1 γ) ˆp p k 2. Then, con- D k D k D k m D k sidering that ˆp p k 2 = ˆp p Dk D k min (qk ) 2 + p Dk D k min (qk ) p k 2, we get D k r k+1 2 q k p k+1 2 r k 2 (1 γ) p k p Dk D k m D k D k min (qk ) 2. m D k By (i), if p Dk min (qk ) = p k, it follows that r k+1 2 < r k 2 When the norm D k m Dm. k is the one induced by D k m = D m, it follows immediately that { r k D k m } is a decreasing sequence. In the case of Algorithm 4, due to the definition of the matrices D k in (25), we have that r k 2 r k 2, and also due to (i) it satisfies r k 2 < r k 1 2. Therefore also in this case { r k D k m D k 1 m D k 1 m D k 1 m D k m } is a decreasing sequence. Therefore, r k 2 is decreasing and convergent. Hence, considering D k m the inequality in (i) and the fact that r k+1 2 r k+1 2 < r k 2 D k+1 m D k m Dm,wealso k obtain that p k p Dk min (qk ) 2 tends to zero. D k Furthermore, since by (17) p k ˆp 2 1/γ ˆp p k+1 2, and considering that ˆp p k+1 2 = ˆp p Dk D k min (qk ) 2 + p k+1 p Dk D k min (qk ) 2, and p k D k D k D k ˆp 2 = ˆp p Dk D k min (qk ) 2 + p k p Dk D k min (qk ) 2, we get that p k p Dk D k min (qk ) 2 D k p k+1 p Dk min (qk ) 2. Hence, since p k p Dk D k min (qk ) goes to zero, it follows that p k+1 p Dk min (qk ) 2 tends to zero. D k Since p k+1 p k D k p k+1 p Dk min (qk ) D k + p Dk min (qk ) p k D k and the right-hand sum tends to zero, then also p k+1 p k D k tends to zero. In order to prove (iv), from the KKT conditions we know that p Dk min (qk ) = [x Dk min ; rdk min ] satisfies xdk min x k = A T D k m rdk min. Then, since pk p Dk min (qk ) 2 = D k x k x Dk min 2 + r k rmin Dk 2, it follows that both A T D k D k m rdk min and rk rmin Dk D k m m tend to zero. Then, as A T D k m rk A T D k m (rk rmin Dk ) + AT D k m rdk min, we get that A T D k m rk goes to zero. Under the hypothesis of (v) we know that p =[ x ; r ] S D m, satisfies p q k 2 = x k x D 2 + r 2 x k x 2 + r 2 for all x k D k m Dm, L D m k sq. Due to the fact that the coordinates of p k+1 =[x k+1 ; r k+1 ] and q k+1 =[x k+1 ;0] are given by the accepted approximation ˆp =[x k+1 ; µ j ], they inherit its properties. Thus, since for all p P, p ˆp is D k -orthogonal to ˆp q k, and in particular p q k 2 = x D x k 2 + r 2 = ˆp q k 2 + p k D k m D ˆp 2, is the minimum k D k