Finding the orthogonal projection of a point onto an affine subspace

Size: px

Start display at page:

Download "Finding the orthogonal projection of a point onto an affine subspace"

Justina Tucker
5 years ago
Views:

1 Linear Algebra and its Applications 422 (2007) Finding the orthogonal projection of a point onto an affine subspace Ján Plesník Department of Mathematical Analysis and Numerical Mathematics, Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska dolina, Bratislava, Slovakia Received April 2006; accepted 6 November 2006 Available online 8 January 2007 Submitted by R.A. Brualdi Abstract A simple method is proposed to find the orthogonal projection of a given point to the solution set of a system of linear equations. This is also a direct method for solving systems of linear equations. The output of the method is either the projection or inconsistency of the system. Moreover, in the process also linearly dependent equations are recognized. This paper is constrained for giving theoretical foundations, computational complexity and some numerical experiments with dense matrices although the method allows to employ sparsity. The raw method could not compete with best software packages in solving linear equations for general matrices, but it was competitive in finding projections for matrices with small number of rows relative to the number of columns Elsevier Inc. All rights reserved. AMS classification: 5A06; 5A03; 5A09; 65F05; 65F50; 65Y20 Keywords: Orthogonal projection; Linear equations. Introduction One of the basic problems in linear algebra is to find the orthogonal projection proj S (x 0 ) of a point x 0 onto an affine subspace S ={x Ax = b} (cf. e.g. [2,0,,28]). This provides a special This research was supported by the Slovak Scientific Grant Agency VEGA. Fax: address: plesnik@fmph.uniba.sk /$ - see front matter ( 2006 Elsevier Inc. All rights reserved. doi:0.06/j.laa

2 456 J. Plesník / Linear Algebra and its Applications 422 (2007) solution of the system of equations. Assuming that A R m n with rank m then it is well known (see e.g. [2]) that for any z S one has proj S (x 0 ) =[I A T (AA T ) A](x 0 z) + z =[I A T (AA T ) A]x 0 + A T (AA T ) b. () In general, to compute inverse matrices is not easy (cf. e.g. [9,8]). It is often recommended to solve several systems of linear equations instead. Therefore sometimes, a better alternative for finding projections is to use a least square method with constraints, more precisely, to solve the following optimization problem min{ x 0 x 2 Ax = b}. (2) To solve a system of linear algebraic equations Ax = b is a fundamental problem of linear algebra. (In the literature they are called also as simultaneous linear equations.) While the theory of such systems is a part of every course in linear algebra, the solution methods dominate in courses from numerical mathematics. The methods are usually classified as direct methods (where a solution is obtained after a finite number of steps) and iterative (typically, after every iteration only an approximate solution is received but the convergence may be fast). Probably the oldest solution method is the Gaussian elimination named after C.F. Gauss, although by Wikipedia the free Internet encyclopedia, the earliest inventor is Chinese mathematician Liu Hui, who lived in 200 s A.D. (see also book [2], introductions of Chapters and 3). There exist many methods for solving systems of linear equations (see e.g. papers [5,6,25] and books [3,4,7,8,2,24]). Many new methods were developed in the last years and many are still being proposed [3 5,8,2,22,26,30]. Nevertheless, it is the main purpose of this paper to present a direct method for computing the orthogonal projections. The method leads to an algorithm for computing solutions to linear systems. There are many methods for systems of linear equations based (explicitly or implicitly) on projections. For example, see papers [3,7,9,23,27] and book [6]. But none of them is the same as our method. Note that sometime relationships between some methods are discovered [3,27,6]. In particular, Lai [20] proved that Purcell s method [23] is theoretically equivalent with the Gauss Jordan elimination method. Our method looks like an underrelaxation method (see e.g. [6]), but it is different. The key difference between these methods and our method is the following. In an iteration of the relaxation methods one attempts to satisfy only one violated equation, ignoring or relaxing others. But in our method, all previously satisfied equations remain satisfied further. This produces a finite method. Instead of projecting a point onto a hyperplane, we move along so-called equiresidual lines. Some properties of such lines are described in Section 2. The method is developed in Section 3. In Section 4 we give the computational complexity of the method, implementation remarks, and various extensions of the basic problem which are solvable by the method. Among others, our method recognizes linearly dependent equations (like elimination methods do), which is often useful []. This paper is limited to theoretical foundations of the method and the exact arithmetic is assumed in all computations. No error analysis [8,29] for the case of imprecise arithmetic is given. However, our experiences with the method indicate that it is satisfactory also in that case. This holds mainly for the very natural problem concerning systems of linear equations Ax = b, where one asks to find a vector x giving sufficiently small residuals in absolute value. It is well known that such a vector can be unsatisfactory for ill-conditioned matrices A if one asks to find a vector which is sufficiently close to the exact solution (this seems to be

3 J. Plesník / Linear Algebra and its Applications 422 (2007) rather artificial problem, but it is important in stability questions [8]). Section 5 gives results of numerical experiments where our method is compared with some known methods implemented in Matlab Equiresidual points and lines In this section we give some basic properties of equiresidual lines. Let us consider a system of equations Ax = b, where the rows of A, say, a T,...,aT k Rn are linearly independent and the components of b are b,...,b k R. Such a system has a solution and can be written as follows: a T x = b, a2 T x = b 2, ak T x = b k. (3) To each equation of (3) a hyperplane can be assigned. P i ={x R n a T i x = b i} for all i. (4) Let S be the solution set of (3). Thus S = P P 2 P k. It is well known that S is an affine subspace and there is a parallel vector subspace S 0 of dimension n k such that for any s S we have S = s S 0 (={s + x x S 0 }). The residual of the ith equation in a point y is the number b i ai T y. A point y is said to be equiresidual for system (3) if it has the same residual for each equation of the system. A line L ={u + λv λ R} is called an equiresidual line of system (3) if each point of L is equiresidual for (3). In general, a line L R n and an affine subspace S R n can either be disjoint, or intersect in exactly one point, or L S. All these possibilities may occur also in the case when L is equiresidual and S is a solution space of (3), as one can easily see e.g. for n = 3 and k = 2. Our method of solving (3) is based on the idea to find an equiresidual line with non-constant residual. Then going along such a line we receive a point of residual 0, i.e. a solution of (3). Therefore we study this notion in some depth. The following assertion is obvious. Lemma. A line is equiresidual for system (3) it contains two distinct equiresidual points for (3). Lemma 2. A line L ={u + λv λ R} is equiresidual for system (3) the following two conditions are satisfied: (i) uis an equiresidual point for (3), (ii) a T v = =at k v. Proof. ( ) The equiresiduality of L means that b a T(u + λv) = =b k ak T (u + λv) for all λ R. Putting λ = 0 we get b a Tu = =b k ak T u which is (i). Letting λ we get (ii). ( ) A combination of (i) with a λ-multiple of (ii) establishes the equiresiduality of L. (5)

4 458 J. Plesník / Linear Algebra and its Applications 422 (2007) For our aims we need equiresidual lines with non-constant residual. They are called briefly as ERNC lines and can be characterized as follows. Lemma 3. A line L ={u + λv λ R} is an ERNC line for system (3) the following two conditions are satisfied: (i) uis an equiresidual point for (3), (ii) a Tv = =at k v/= 0. Proof. In addition to Lemma 2 one sees that for λ /= λ we have b i ai T(u + λ v) /= b i ai T(u + λ v) if and only if ai T v/= 0. Immediately from Lemmas 2 and 3 we get Corollary. Let u, v, w R n, where v/= 0. Then line L ={u + λv λ R} is equiresidual or ERNC line for (3) the parallel line {w + λv λ R} is equiresidual or ERNC line for (3), respectively. We will use special ERNC lines, namely those orthogonal to the solution set S. Such a line is called an equiresidual orthogonal (ERO, in short) line of system (3). As usual, the linear hull of a set of vectors a,...,a k is the minimal vector subspace containing all the vectors and is denoted by span{a,...,a k }. Theorem. A line L ={u + λv λ R} is an ERO line for (3) the following conditions hold: (a) uis equiresidual for (3), (b) a Tv = =at k v/= 0, (c) v span{a,...,a k }. Proof. ( ) (a) is obvious. Let s S. Then the vector subspaces S 0 = ( s) S and V 0 = span{a,...,a k } are mutually orthogonal complements in R n and S 0 V 0 ={0}. Thus the direction vector v of L belongs to V 0, as desired in (c). By Lemma 2(ii) there is a real μ such that a Tv = =at k v = μ. To establish (b) we prove that μ/= 0. By (c) for any vector y V 0 there are reals α,...,α k such that y = α a + +α k a k. Assume that μ = 0. Then y T v = α a Tv + +α kak T v = 0, i.e. v is orthogonal to each vector of V 0 and thus v V0 = S 0. According to (c) we have v V 0 and hence v V 0 S 0 ={0}, which is impossible (because v is a direction vector of a line). ( ) By (a) and (b) L is equiresidual (Lemma 2). According to (c) we have v S0 and thus line L is orthogonal to S 0. Hence L is an ERO line. Theorem 2. For the solution set S of system (3) the following hold: (i) Each point of S lies in a unique ERO line for (3). (ii) Each ERO line of (3) contains a unique point of S. (iii) All ERO lines of (3) are mutually parallel and their common direction vector is independent of the right-hand side vector b of (3).

5 J. Plesník / Linear Algebra and its Applications 422 (2007) Proof. (i) We want to show that for any point u S there is a non-zero vector v R n fulfilling the conditions (a) (c) of Theorem. The point u is evidently equiresidual (with residual 0) for (3) and thus (a) is satisfied. Now we are going to find λ,...,λ k R with v = λ a + +λ k a k and a Tv = =at k v = μ where μ/= 0. Clearly, this will fulfill (b) and (c). It is sufficient to ensure the solvability of the system ( )( ) ( A T I λ 0 =, (6) 0 A v μ) where λ T = (λ,...,λ k ), the vector 0 consists of zeros and the vector μ consists of μ s only. Let M denote the matrix of system (6). One sees that M is a non-singular square matrix and hence (6) has a unique solution ( ) ( ) M 0 = μm 0 = μ μ ( ) λ(), v() where λ() and v() corresponds to μ =. Clearly v() /= 0 (otherwise Av() = 0 /= ) and also λ() /= 0 (otherwise A T λ() v() = v() /= 0 violating (6)). This proves (i). (ii) Let L ={u + λv λ R} be an ERO line for (3). Then there is exactly one λ 0 such that in the point u + λ 0 v the residual b i ai T(u + λ 0v) = 0 for all i. Namely, we can take λ 0 = (b i ai Tu)/aT i v that does not depend on i because conditions (a) and (b) of Theorem hold. Therefore L S contains exactly one point u + λ 0 v. (iii) In the proof of (i) we have seen that any direction vector v(μ) of an ERO line of (3) isa non-zero multiple of one fixed vector and does not depend on u S. Thus v is independent of the right-hand side vector b of system (3), as desired. Theorem 3. Let x 0 R n. Then any point x x 0 span{a,...,a k } has the same orthogonal projection onto S. Proof. Denote V 0 = span{a,...,a k },V = x V 0,S = P P k and S 0 = ( s) S for afixeds S. As already mentioned above, the vector subspaces V 0 and S 0 are mutually orthogonal and 0 is the only their common point. The affine subspaces V and S are also mutually orthogonal and intersect at a unique point y. For each point x V the vector x y is orthogonal to S and hence y is the orthogonal projection of x onto S. Corollary 2. Let x 0 R n and L be an equiresidual line for (3) lying in x 0 span{a,...,a k }. Then L S consists of a single point and it is the orthogonal projection of x 0 onto S. Proof. As L is orthogonal to S, L S consists of a single point y and it is an orthogonal projection of a point x L onto S. By Theorem 3, x 0 has the same projection y. 3. Developing the solution method Let us consider a system of m linear equations labeled as (E ),...,(E m ): a Tx = b (E ), a2 Tx = b 2 (E 2 ), am T x = b m (E m ), (7)

6 460 J. Plesník / Linear Algebra and its Applications 422 (2007) a a 2 P x 2 v 2 w u L 2 P 2 x w 0 v L' x 0 L Fig.. Illustrating the algorithm. where a,...,a m R n \{0}, b,...,b m R. In general, we allow dependent equations and the inconsistency. Our algorithm will recognize such cases. More precisely, it sequentially processes the equations and if a dependent equation (on previous ones) is encountered then the equation is skipped and not considered further. In the case an inconsistent equation (with previous ones) is found, then the algorithm ends. As before, the solution set (a hyperplane) of an equation (E i ) is denoted by P i and the solution set of (7) (i.e. E to E m ) is S = P P m. Moreover, we will assume that also a point x 0 R n is given because our aim is to find the orthogonal projection of x 0 onto the set S. The method is based on Corollary 2, which requires to find an equiresidual line (in fact ERO line). Such a line will be found recursively as illustrated in Fig.. The first ERO line L for (E ) is simply the orthogonal line to P through x 0. Thus we put v = a and we find the intersection, say x, of line L ={x 0 + λv λ R} and hyperplane P. Evidently, point x is the orthogonal projection of x 0 to P. It is also clear that L is an ERO line for (E ). To find an ERO line for (E ) and (E 2 ), we find two distinct equiresidual points u,w for (E ) and (E 2 ) (Lemma ). In L we search for an equiresidual point u for (E ) and (E 2 ).AsL ={x + λv λ R}, the condition of equiresiduality b a T (x + λv ) = b 2 a T 2 (x + λv ) gives a λ and then we get u = x + b 2 a2 Tx a2 Tv a Tv v. (8) Our example in Fig. gives non-zero denominator in (8), however in general, it may not be the case (e.g. if the equation (E 2 ) would be the same as (E )). This pitfall is postponed to the end of this example and will be discussed also in a general step. To find a point w, we start with point

7 J. Plesník / Linear Algebra and its Applications 422 (2007) w 0 = x 0 + a 2 and construct a line L through this point with direction vector v. In this ERO line for (E ) we look for an equiresidual point w for (E ) and (E 2 ). An easy computation gives w = x 0 + a 2 + b 2 b a2 T(x 0 + a 2 ) + a T(x 0 + a 2 ) a2 Tv a Tv v. (9) The denominator in (9) is the same as in (8) and hence non-zero. Now we are lucky as we get w /= u and thus these points determine an equiresidual line L 2 for (E ) and (E 2 ) with direction vector v 2 = w u. (The other case is possible and will be discussed later.) Since L 2 x 0 span{a,a 2 }, this line is orthogonal to P P 2 and hence it is an ERO line for (E ) and (E 2 ). By Corollary 2 the intersection x 2 of L 2 with P (or P 2 ) is the orthogonal projection of x 0 to the solution set P P 2. As mentioned above, there may exist also other cases. First, assume that in our example we have the equation (E 2 ) the same as (E ). This gives zero denominator in (8) as well as in (9). In this case we can replace (E 2 ) by the equation ( a 2 ) T x = ( b 2 ). Now the denominator is non-zero, but we encounter with another problem. Let us consider the case a 2 = a and b 2 = b. Then we have P = P 2 and by our procedure one gets u = x = w.nowx P 2 and we conclude that (E 2 ) is linearly dependent on (E ). For further consideration a dependent equation is deleted (skipped). Finally assume that a 2 = a and b 2 /= b. Then the hyperplanes P and P 2 are parallel, but distinct. In this case we observe that x / P 2 and conclude that (E 2 ) is inconsistent with (E ). Now we are prepared to explain how to proceed in a general step. Assume that k 2 and we have already found the first k linearly independent equations of (7). Owing to the reasons of simplicity we will denote them as (E ),...,(E k ) (all encountered dependent equations have been deleted). That means we have found and saved direction vectors v,...,v k of the corresponding ERO lines and the last orthogonal projection x k of x 0 to S k = P P k. Moreover, it is assumed that ai T v k = aj T v k /= 0 whenever i j k, (0) aj T v j a T v j /= 0 whenever <j k. () In ERO line L k ={x k + λv k λ R} for (E ),...,(E k ) we look for an equiresidual point u k for (E ),...,(E k ). It is sufficient to demand u k have the same residual for (E ) and (E k ). Since x k P we get the following condition on λ λ(ak T v k a T v k ) = b k ak T x k. (2) If the term in the parenthesis is non-zero then we can compute λ and thereby u k. If it is zero, then (E k ) is replaced by equation (αa k ) T x = (αb k ), where α R and α/= 0,. To unchange the absolute value of a residual, it is recommended to put α = (in this case also the new equation is numerically the same as the original one). We assert that now we get the term in the parenthesis non-zero. Otherwise a T k v k a T v k = 0 and also αa T k v k a T v k = 0. This yield a T v k = 0, contradicting (0). Hence in either case we can determine u k. Now we are going to compute another equiresidual point w k for (E ),...,(E k ).Webegin from point w 0 = x 0 + a k and in line L ={w 0 + λv λ R} we look for an equiresidual point w for (E ) and (E 2 ). This yields the following condition: λ(a T 2 v a T v ) = b 2 b (a 2 a ) T w 0. Since () holds, λ and thereby w can be computed. Using w and v 2 we find an equiresidual point w 2 for (E ), (E 2 ) and (E 3 ), etc. Finally in line L k ={w k 2 + λv k λ R} we find an equiresidual point w k for (E ),...,(E k ). Since L k is equiresidual for (E ),...,(E k ) (Corollary ), it is sufficient to require that w k is equiresidual for (E ) and (E k ), which yields the

8 462 J. Plesník / Linear Algebra and its Applications 422 (2007) condition on λ: λ(a T k v k a T v k ) = b k b (a k a ) T w k 2. And again by the assumption () one can compute λ and thereby w k. Before continuing we state a crucial assertion. Theorem 4. The following conclusions hold: (a) If u k /= w k, then a,...,a k are linearly independent. (b) If u k = w k and x k P k, then equation (E k ) is dependent on (E ),...,(E k ). (c) If u k = w k and x k / P k, then equation (E k ) is inconsistent with (E ),...,(E k ). Proof. At first notice that by Theorem v,...,v k span{a,...,a k }. Thus u k x 0 span{a,...,a k } and w k x 0 span{a,...,a k }. (a) For a contradiction, assume that a,...,a k are linearly dependent. Then u k,w k x 0 span{a,...,a k }. The system of k linearly independent equations (E ),...,(E k ) has a unique solution in the affine subspace x 0 span{a,...,a k }, namely x k. This point is equiresidual (with zero residual) for (E ),...,(E k ) and by Theorem 2 it is contained in exactly one ERO line for (E ),...,(E k ), namely the line L k, and the points u k and w k lie in it. Either of points u k and w k has been uniquely determined as an equiresidual point in L k for (E ),...,(E k ), and therefore u k = w k. This contradicts our assumption. (b) Suppose that a,...,a k are linearly independent. Then span{a,...,a k } /= a k span {a,...,a k }.Asu k x 0 span{a,...,a k } and w k x 0 + a k span{a,...,a k } (because by Theorem v,...,v k span{a,...,a k }),wehaveu k /= w k, a contradiction. Thus a k is a linear combination of vectors a,...,a k. Moreover, the point x k, which is a common point of (E ),...,(E k ), fulfills also (E k ) and we conclude that equation (E k ) is dependent on (E ),...,(E k ). (c) The same proof as in (b) shows that a k is a linear combination of vectors a,...,a k. Geometrically this means that the hyperplane P k either contains the set S k = P P k or these sets are disjoint. As x k S k and x k / P k, the latter case occurs. Remark. The last theorem does not hold if instead u k and w k there are considered arbitrary points u x 0 span{a,...,a k } and w (x 0 + a k ) span{a,...,a k } which are equiresidual for (E ),...,(E k ). This can be demonstrated by the following example in R 3 where we have a system of four equations: (, 0, )x = (E ), (0,, )x = (E 2 ), (0,, )x = (E 3 ), (, 0, )x = (E 4 ). Let x 0 = (0, 0, 0) T. Then the points u = (0, 0, 0) T x 0 span{a,a 2,a 3 } and w = (0, 0, ) T x 0 span{a,a 2,a 3,a 4 } are distinct and lie in the ERO line L ={(0, 0, 0) T + λ(0, 0, ) T λ R} for (E ), (E 2 ) and (E 3 ). They are equiresidual for (E ), (E 2 ), (E 3 ) and (E 4 ). Clearly, the vectors a,a 2, a 3 are linearly independent, but a,a 2,a 3, a 4 not. Hence it is important that our algorithm changes (E 4 ) to equation (, 0, )x =. Then we get u = w. Now we can continue in the process as follows. If u k /= w k then we put v k = w k u k and define L k ={u k + λv k λ R}. Since v k span{a,...,a k }, the line L k is an ERO line for (E ),...,(E k ) (by Lemma 2 and Theorem ). According to Corollary 2, L k P consists of a single point x k, which is the orthogonal projection of x 0 to S k = P P k. Theorem 4 tell us

9 J. Plesník / Linear Algebra and its Applications 422 (2007) that a,...,a k are linearly independent and thus we can go to the next iteration to scan equation (E k+ ) (if any). If u k = w k and x k P k then by Theorem 4 (E k ) depends on previous equations and can be skipped or deleted. The next equation (if any) is denoted again as (E k ) and scanned in the next iteration. Finally if u k = w k and x k / P k, then by Theorem 4 the system (7) is inconsistent and the algorithm halts. The following assertion summarizes some properties of the vectors v i. Theorem 5. Suppose that scanned equation (E k ) led to a new vector v k. Then we have: (a) span{v,...,v k }=span{a,...,a k }, (b) v k is orthogonal to the set S k = P P k, (c) a T i v k = a T j v k /= 0 whenever i j k, (d) a T j v i a T v i /= 0 whenever i<j k. Proof. (a) We proceed by induction on k. The case k = being trivial, we will suppose that k 2. By the algorithm we have v k = w k u k a k span{a,...,a k } span{a,...,a k }. By the induction hypothesis we have span{v,...,v k }=span{a,...,a k }. Therefore, if v k would belong to the former set then a k would belong to the latter set, a contradiction. (b) During the development of our algorithm we have already proved that v k is a direction vector of an ERO line for (E ),...,(E k ), as desired. By Theorem this immediately implies also (c). As to (d), it holds for j k by() and for j = k by the algorithm. We provide a summary of the algorithm in Table. 4. Complexity and extensions In this section we show the computational complexity of our algorithm, give several implementation remarks and finally mention some extensions of the basic problem. 4.. Computational complexity We follow the algorithm as presented in Table. One sees that initialization can be done in time O(n) (flops). Let us consider kth large iteration. Except of the while cycle, every item can be computed in time O(n). The jth small iteration (j = 2,...,k)requires 4n + 4 flops (it is supposed that the values (a j a ) T v j are stored). Thus in total the while cycle can be done in time (4n + 4)(k ) flops. Since k = 2,...,m, the overall complexity of the algorithm is 4(n + ) m k=2 (k ) + O(mn) 2m 2 n + O(mn). This is better than the complexity 2m 2 n + mn 2 + m 3 + O(mn) of computing the projection by () (here we assumed that the inverse of an m m matrix requires m 3 + O(m 2 ) flops). On the other hand, in the case of a square nonsingular n n matrix A we get 2n 3 + O(n 2 ), which is worse than the standard complexities (e.g. 2n 3 /3 + O(n 2 ) for Gaussian elimination). As to memory requirements, the input data need m(n + ) + n. Except that our computation requires storage at most mn (for vectors v k ) and O(n) for others. Thus the overall memory requirements do not exceed 2mn + O(m + n). This is approximatively the same as do computations by

10 464 J. Plesník / Linear Algebra and its Applications 422 (2007) Table Algorithm INPUT: () a system of m linear equations ai Tx = b i with a i R n \{0} and b i R, i m, (2) a vector x 0 R n OUTPUT: () the maximal index p such that the subsystem consisting of the first p equations has its solution set S nonempty, (2) each of the first p equations which is linearly dependent on the previous equations is marked as dependent, and (3) the orthogonal projection y of x 0 onto S INITIALIZATION: k = ; p = ; v = a ; x = x 0 + b a Tx 0 a Tv v ; if m = then [y = x ; HALT]; LARGE ITERATION: k = k + ; {the working index of the next scanned equation is k} p = p + ; if (a k a ) T v k = 0 then [a k = a k ; b k = b k ]; u k = x k + b k a k Tx k (a k a ) T v v k ; k w 0 = x 0 + a k ; j = 2; while j k do SMALL ITERATION: [ wj = wj 2 + b j b (a j a ) T ] w j 2 (a j a ) T v v j ; j = j + j if u k /= w k then [ v k = w k u k ; x k = u k + b a Tu k a Tv v k ; k if k = m then [y = x k ; HALT] ] else goto LARGE ITERATION ; if ak Tx k = b k then [mark the kth equation as dependent and do not consider it anymore; if k = m then HALT else [k = k ; goto LARGE ITERATION]]; p = p ; HALT; {the kth equation is inconsistent} (), but worse than O(n 2 ) (in the case of square non-singular n n matrices when solving linear equations by Gaussian elimination) Implementation remarks Here we suggest some recommendations for implementation. () Instead of w 0 = x 0 + a k we can take w 0 = x 0 + βa k for any real β /= 0. This should be applied to obtain comparable summands in norm. (2) In small iteration it suffices to keep in memory only the last vector w j. (3) Similarly in large iteration we need only the last vectors u j,w j and x j.

11 J. Plesník / Linear Algebra and its Applications 422 (2007) (4) However, all vectors v j should be kept and it is recommended to keep also all denominators a T j v j a T v j. (5) One can observe that in all scalar products a row a T i acts as a factor. Consequently, if the matrix A of the system is sparse then the products can be performed with computational savings. Moreover, if the vector x 0 is sparse, then for small indices j also u j,w j,v j and x j are relative sparse and also sums can be performed with savings. Clearly, as j becomes larger these vectors are denser and denser Some extensions of the basic problem () Suppose that the rank of the matrix A is k. Once the vectors v,...,v k have been computed, any right-hand side can be processed using no more than O(mn) flops because the vectors are the same for all right-hand sides b (Theorem 2). For the first computation we can take for x 0 any point and hence also the zero vector. (2) Once the orthogonal projection of a point x 0 onto the solution set S of a system Ax = b has been found, the orthogonal projection of a new point x 0 onto S can be computed in time O(kn), where k is the rank of A (Theorem 2). (3) Theorem 2 ensures computational savings also in the case when we need to solve several systems of equations whose matrices have many first rows in common (because they have many first vectors v i in common). (4) If a system of equations Ax = b is inconsistent, then we can solve system of normal equations A T A x = A T b and thus find a least squares solution x of the original system (see e.g. [2, p. 439]) at least in theory. (5) The whole solution set S of a system Ax = b can be found e.g. as follows. If A is an m n matrix of rank k, then we first find the inconsistency or proj(0) and the rank in time at most 2m 2 n+o(mn). Then we choose a basis of R n, for example the unit vectors e,...,e n (the columns of the identity matrix), and find proj(e ),...,proj(e n ) in time O(kn) each. These projections generate S. Namely, S = proj(0) span{proj(e ),...,proj(e n )}. (6) Our method tends to produce such solutions of systems Ax = b which minimize the absolute residual of the first equation. Therefore, if the absolute residual of a specific equation is needed to be very small, then the equation should be placed as the first in the system. Another way is to keep the original position but to take the M-multiple of such an equation with a big multiplier M; this can be applied even simultaneously for several equations (however, this can be cancelled by a scaling procedure). 5. Numerical experiments In this section we present numerical results for systems Ax = b where A is a dense matrix with m rows and n columns, m n. We used the software system Matlab 6 running on a PC with an Intel P4 3.2 GHz processor. As to test data, we generated two kinds of matrices denoted as rand and hilb. In the former case we shifted a random matrix to get also negative elements: A = rand(m, n) 0.5. In the latter case we generated a (rectangular) Hilbert matrix A with A(i, j) = /(i + j ). In all cases we received a full rank matrix A and we took a vector z of s

12 466 J. Plesník / Linear Algebra and its Applications 422 (2007) Table 2 Finding projections: numerical results for random matrices m n Method y x 0 b Ay / b Time invuse.0e+0 7.3e lsqlin.0e+0 4.e proj.0e+0 2.7e invuse.0e+0 3.4e lsqlin.0e+0.2e proj.0e+0.6e invuse.0e+0 2.4e lsqlin.0e+0 5.2e proj.0e+0 3.2e invuse.0e+0 9.e lsqlin.0e+0 2.0e proj.0e+0 5.9e invuse 2.0e+0 5.5e lsqlin 2.0e+0.6e proj 2.0e+0 8.9e invuse.5e+0 5.e lsqlin.5e+0.e proj.5e+0 3.3e invuse.0e+0 3.8e 5.29 lsqlin.0e+0.0e proj.0e+0 4.4e invuse 7.6e 2.6e lsqlin 7.6e 9.9e proj 7.6e.8e invuse 5.8e 3.0e lsqlin 5.8e 3.e proj 5.8e 2.6e invuse 2.5e.5e lsqlin 2.5e 3.9e proj 2.5e.8e invuse 3.3e 2.9e lsqlin 3.3e 2.3e proj 3.3e.7e ,000 proj 5.9e 5.0e ,000 proj 3.6e.e ,000 proj 2.4e 6.8e ,000 proj 3.2e 8.0e ,000 proj.e.3e ,000 proj 9.5e 2 4.9e ,000 proj 8.9e 2.6e ,000 proj 7.6e 2 2.4e ,000 proj 5.e 2 2.6e ,000 proj 3.7e 2 2.8e ,000 proj 2.8e 2 5.0e ,000,000 proj.6e 2 3.9e ,000 proj 4.2e 2 3.3e ,000 proj 2.5e 2 2.2e ,000 proj.8e 2 3.9e ,000,000 proj.0e 2 3.8e ,000,000 proj 6.4e 3.4e

13 J. Plesník / Linear Algebra and its Applications 422 (2007) (i.e. z = ones(n, )) to be a solution and defined b = Az. Although the presented results concern the -norm, similar results were obtained also for -norm and 2-norm. 5.. Orthogonal projections We computed the orthogonal projection of a zero vector x 0 (x 0 = zeros(n, )) onto the solution set of the above system Ax = b. We compared three methods referred to as invuse, lsqlin, and proj. invuse computes proj(x 0 ) by formula () and uses Matlab function inv : C = A inv(a A ), y = (I C A) x 0 + C b (the fact that x 0 = 0 was not exploited). lsqlin solves the least square problem (2) with linear constraints by using Matlab function lsqlin : y = lsqlin(i, x 0, [], [],A,b). The method proj is our projection method with zero tolerance e 4: y is its output vector (the orthogonal projection of x 0 to the solution set of the system). In Table 2 there are given the distances y x 0 between x 0 and a result y, relative residuals b Ay / b, and cpu times (in seconds) for various sizes of random matrices A. Similar results for Hilbert matrices are presented in Table 3. We had no exact projection of x 0 at hand except of the case when the system had a unique solution (m = n). Since the real proj(x 0 ) must be in the solution set of the system Ax = b and minimizes the distance from x 0, the smaller distance and smaller relative residual the better result. Table 3 Finding projections: numerical results for Hilbert matrices m n Method y x 0 b Ay / b Time invuse 3.6e+ 2.e lsqlin 0.0e+0.0e proj.0e+0 5.4e invuse.2e+ 4.e lsqlin.0e+0.7e 5.26 proj.0e+0 6.7e invuse.6e+0 7.0e 0.45 lsqlin 0.0e+0.0e proj 9.7e+0 4.8e invuse 5.3e+0.2e 0.28 lsqln 2.6e+0 3.3e proj 4.2e+0 6.7e ,000 proj 4.8e+3 6.7e ,000 proj.2e+2.3e ,000 proj.7e+2.4e ,000 proj 2.6e+2 6.9e ,000 proj 3.4e+ 5.3e ,000 proj.4e+.0e ,000 proj.6e+ 8.9e ,000 proj.0e+.8e ,000 proj 2.0e+.4e ,000 proj.6e+ 8.7e ,000 proj.7e+.5e ,000,000 proj.4e+ 2.8e ,000,000 proj 8.3e+0.0e

14 468 J. Plesník / Linear Algebra and its Applications 422 (2007) Note that for n>2000 the presented results cover only our method proj because invuse and lsqlin were not able to compute such large problems and ended with Matlab error message out of memory Linear equations We computed a solution of the above system Ax = b by using three Matlab methods and our projection method. The methods are referred to as: rref, A\b, lsqr and proj. These methods are applied as follows. rref (Gauss Jordan elimination with partial pivoting): B = rref([a b]), u = B( : m, n + ). To obtain a solution vector x R n, the vector u was complemented by zeros whenever m < n. A\b (Gaussian elimination and other techniques): x = A\b. lsqr (a least square method with default tolerance 2.3e 6 and maximum number of iterations 2000): x = lsqr(a,b,tol,maxiter). proj (our projection method with zero tolerance e 4): x is the output (the orthogonal projection of zero vector x 0 to the solution set of the system). In Table 4 there are given values x z, relative residuals b Ax / b, and cpu times (in seconds) for various sizes of random matrices A. Similar results for Hilbert matrices are presented in Table 5. Note that the error is equal to x z if z is the only solution (m = n), but it is not defined otherwise (m<n). The results show that the fastest method A\b was better than proj also in accuracy for random matrices. But for some Hilbert matrices our method proj gave smaller errors. Table 4 Linear equations: numerical results for random matrices m n Method x z b Ax / b Time rref 7.2e 3 2.0e 5.29 A\b 3.e 3 2.8e lsqr 2.6e 3.4e proj.3e 2.3e rref 8.4e 4 3.6e A\b 9.9e 4 4.7e lsqr 2.5e 4.6e proj 3.3e 2.2e rref 3.2e 2.3e A\b 6.5e 3 3.3e lsqr 4.8e 2 2.9e proj 3.9e 0 9.0e rref 2.7e 2 2.2e A\b 3.3e 2 7.3e lsqr 9.3e 2 2.0e proj.0e 8.0e ,000 rref 2.4e+3 4.0e A\b.5e+2.2e lsqr.e+0.0e proj.e+0.2e ,000,000 rref.7e+3 7.8e A\b 4.0e+2 3.2e lsqr.0e+0 5.e proj.0e+0 3.8e 4 2.5

15 J. Plesník / Linear Algebra and its Applications 422 (2007) Table 5 Linear equations: numerical results for Hilbert matrices m n Method x z b Ax / b Time 0 0 rref 3.5e 4.2e A\b 4.4e 4.5e lsqr 9.5e 6 6.0e proj 2.3e 4.6e rref 3.0e+3.5e 0.04 A\b 4.4e+.8e lsqr 7.4e 6 7.4e proj 3.9e 4 4.0e rref.7e+3 2.4e 0.2 A\b 5.5e+2.e lsqr 6.7e 6 7.8e proj 2.0e 3 5.3e rref.e+3 4.2e 0.82 A\b.0e+3 2.5e lsqr 2.2e 2.e proj.3e 3 6.7e rref 2.e+3 9.e 3.2 A\b 6.3e+3 7.5e lsqr.0e 3.8e 5.54 proj 9.8e 2 5.4e ,000 rref.0e+0 9.9e A\b.5e+4 3.6e lsqr 8.5e+0.e proj 2.6e+2 6.9e ,000,000 rref.0e+0 9.9e A\b 3.e+3 6.2e lsqr.4e+.e proj.3e+ 2.8e Conclusion We have presented a new idea for computing the orthogonal projection of a point onto an affine subspace. The numerical experiments with dense m n matrices showed that in finding orthogonal projections our method was competitive if not superior to the other methods whenever m n. In the remaining cases our method provides satisfactory results for practical purposes. Clearly, our method can serve also for finding a solution of a system of linear equations. Although the theoretical computational complexity of our method for linear equations is worse than those of standard methods, the computational experiments with dense matrices are satisfactory although our method could not compete with standard methods. Note that we tested our raw method against well equipped standard methods (with various sophisticated techniques). We believe that better results can be expected after improvements will be implemented also in our method. Thus there is large room here for a further research. Acknowledgments The author would like to thank the referee for his/her valuable comments and detailed suggestions which led to an improvement of this paper.

16 470 J. Plesník / Linear Algebra and its Applications 422 (2007) References [] E.D. Andersen, Finding all linearly dependent rows in large-scale linear programming, Optim. Methods Softw. 6 (995) [2] M. Arioli, A. Laratta, Error analysis of algorithms for computing the projection of a point onto a linear manifold, Linear Algebra Appl. 82 (986) 26. [3] M. Benzi, C.D. Meyer, A direct projection method for sparse linear systems, SIAM J. Sci. Comput. 6 (995) [4] M. Benzi, C.D. Meyer, M. Tůma, A sparse approximate inverse preconditioner for the conjugate gradient method, SIAM J. Sci. Comput. 7 (996) [5] L.M. Bregman, Y. Censor, S. Reich, Y. Zepkowitz-Malachi, Finding the projection of a point onto the intersection of convex sets via projections onto half-spaces, J. Approximation Theory 24 (2003) [6] C. Brezinski, Projection Methods for Systems of Equations, North-Holland, Amsterdam, 997. [7] G. Cimmino, Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari, La Ricerca Scientifica XVI, Series II, Anno IX, vol., 938, pp [8] A. Dax, Line search acceleration of iterative methods, Linear Algebra Appl. 30 (990) [9] J. Demmel, B. Diament, G. Malajovich, On the complexity of computing error bounds, Found. Comput. Math. (200) [0] J. Ding, Perturbation analysis for the projection of a point to an affine set, Linear Algebra Appl. 9 (993) [] J. Ding, Perturbation results for projecting a point onto a linear manifold, SIAM J. Matrix Anal. Appl. 9 (998) [2] T. Elfving, A projection method for semidefinite linear systems and its applications, Linear Algebra Appl. 39 (2004) [3] D.K. Faddeev, V.N. Faddeeva, Computational Methods of Linear Algebra, (Russian)Fizmatgiz, Moscow, 960, English translation: Freeman, San Francisco, 963). [4] M. Fiedler, Special Matrices and their Applications in Numerical Mathematics, (Czech)SNTL, Prague, 980 (English translation (updated): Kluwer, Dordrecht; SNTL, Prague, 986). [5] G.E. Forsythe, Solving linear equations can be interesting, Bull. Amer. Math. Soc. 59 (953) [6] L. Fox, A short account of relaxation methods, Quarterly J. Mech. Appl. Math. (948) [7] G. Golub, C.F. Van Loan, Matrix Computations, third ed., The John Hopkins University Press, Baltimore, MD, 996. [8] N.J. Higham, Accuracy and Stability of Numerical Algorithms, second ed., SIAM, Philadelphia, PA, 996. [9] S. Kaczmarz, Angenäherte Auflösungen von Systemen linearer Gleichungen,Bull. Acad. Polon. Sci. Lett. (Cracovie) Class Sci. Math. Natur. Seria A, Sci. Math. 35 (937) [20] C.H. Lai, On Purcell s method and the Gauss Jordan elimination method, Int. J. Math. Educ. Sci. Technol. 25 (994) [2] C.D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, PA, [22] A.R.L. Oliveira, D.C. Sorensen, A new class of preconditioners for large-scale linear systems from interior point methods for linear programming, Linear Algebra Appl. 394 (2005) 24. [23] E.W. Purcell, The vector method for solving simultaneous linear equations, J. Math. Phys. 32 (954) [24] A. Quarteroni, R. Sacco, F. Saleri, Numerical Mathematics, Springer-Verlag, New York, [25] Y. Saad, H.A. van der Vorst, Iterative solution of linear systems in the 20th century, J. Comput. Appl. Math. 23 (2000) 33. [26] Y. Shi, Solving linear systems involved in constrained optimization, Linear Algebra Appl. 229 (995) [27] F. Sloboda, A parallel projection method for linear algebraic systems, Appl. Math. 23 (978) [28] M. Wei, On the error estimate for the projection of a point onto a linear manifold, Linear Algebra Appl. 33 (990) [29] J.H. Wilkinson, Modern error analysis, SIAM Rev. 3 (97) [30] X.-Y. Wu, R. Shao, G.-H. Xue, Iterative refinement of solution with biparameter for solving ill-conditioned systems of linear algebraic equations, Appl. Math. Comput. 3 (2002)

Bounds on the Largest Singular Value of a Matrix and the Convergence of Simultaneous and Block-Iterative Algorithms for Sparse Linear Systems

Bounds on the Largest Singular Value of a Matrix and the Convergence of Simultaneous and Block-Iterative Algorithms for Sparse Linear Systems Charles Byrne (Charles Byrne@uml.edu) Department of Mathematical