Formulation of a Preconditioned Algorithm for the Conjugate Gradient Squared Method in Accordance with Its Logical Structure

Applied Mathematics 215 6 1389-146 Published Online July 215 in SciRes http://wwwscirporg/journal/am http://dxdoiorg/14236/am21568131 Formulation of a Preconditioned Algorithm for the Conjugate Gradient Squared Method in Accordance with Its Logical Structure Shoji Itoh 1 Masaai Sugihara 2 1 Division of Science School of Science and Engineering Toyo Deni University Saitama Japan 2 Department of Physics and Mathematics College of Science and Engineering Aoyama Gauin University Kanagawa Japan Email: itosho@acmorg Received 22 June 215; accepted 27 July 215; published 3 July 215 Copyright 215 by authors and Scientific Research Publishing Inc This wor is licensed under the Creative Commons Attribution-NonCommercial International License (CC BY-NC http://creativecommonsorg/licenses/by-nc/4/ Abstract In this paper we propose an improved preconditioned algorithm for the conjugate gradient squared method (improved P for the solution of linear equations Further the logical structures underlying the formation of this preconditioned algorithm are demonstrated via a number of theorems This improved P algorithm retains some mathematical properties that are associated with the derivation from the bi-conjugate gradient method under a non-preconditioned system A series of numerical comparisons with the conventional P illustrate the enhanced effectiveness of our improved scheme with a variety of preconditioners This logical structure underlying the formation of the improved P brings a spillover effect from various bi-lanczos-type algorithms with minimal residual operations because these algorithms were constructed by adopting the idea behind the derivation of These bi-lanczos-type algorithms are very important because they are often adopted to solve the systems of linear equations that arise from large-scale numerical simulations Keywords Linear Systems Krylov Subspace Method Bi-Lanczos Algorithm Preconditioned System P 1 Introduction In scientific and technical computation natural phenomena or engineering problems are described through numerical models These models are often reduced to a system of linear equations: How to cite this paper: Itoh S and Sugihara M (215 Formulation of a Preconditioned Algorithm for the Conjugate Gradient Squared Method in Accordance with Its Logical Structure Applied Mathematics 6 1389-146 http://dxdoiorg/14236/am21568131

S Itoh M Sugihara A x b (1 where A is a large sparse coefficient matrix of size n n x is the solution vector and b is the right-hand side (RHS vector The conjugate gradient squared ( method is a way to solve (1 [1] The method is a type of bi- Lanczos algorithm that belongs to the class of Krylov subspace methods Bi-Lanczos-type algorithms are derived from the bi-conjugate gradient (BiCG method [2] [3] which assumes the existence of a dual system T A x b (2 Characteristically the coefficient matrix of (2 is the transpose of A In this paper we term (2 a shadow system Bi-Lanczos-type algorithms have the advantage of requiring less memory than Arnoldi-type algorithms another class of Krylov subspace methods The method is derived from BiCG Furthermore various bi-lanczos-type algorithms such as Bitab [4] G [5] and so on have been constructed by adopting the idea behind the derivation of These bi-lanczos-type algorithms are very important because they are often adopted to solve systems in the form of (1 that arise from large-scale numerical simulations Many iterative methods including bi-lanczos algorithms are often applied together with some preconditioning operation Such algorithms are called preconditioned algorithms; for example preconditioned (P The application of preconditioning operations to iterative methods effectively enhances their performance Indeed the effects attributable to different preconditioning operations are greater than those produced by different iterative methods [6] However if a preconditioned algorithm is poorly designed there may be no beneficial effect from the preconditioning operation Consequently P holds an important position within the Krylov subspace methods In this paper we identify a mathematical issue with the conventional P algorithm and propose an improved P 1 This improved P algorithm is derived rationally in accordance with its logical structure In this paper preconditioned algorithm and preconditioned system refer to solving algorithms described with some preconditioning operator M (or preconditioner preconditioning matrix and the system converted by the operator based on M respectively These terms never indicate the algorithm for the preconditioning operation itself such as incomplete LU decomposition approximate inverse and so on For example for a preconditioned system the original linear system (1 becomes Ax b (3 A M AM x M x b M b (4 1 1 1 L R R L under the preconditioner M MLMR M A Here the matrix and the vector in the preconditioned system are denoted by the tilde ( However the conversions in (3 and (4 are not implemented; rather we construct the preconditioned algorithm that is equivalent to solving (3 This paper is organized as follows Section 2 provides an overview of the derivation of the method and the properties of two scalar coefficients ( α and This forms an important part of our argument for the preconditioned BiCG and P algorithms in the next section Section 3 introduces the P algorithm An issue with the conventional P algorithm is identified and an improved algorithm is proposed In Section 4 we present some numerical results to demonstrate the effect of the improved P algorithm with a variety of preconditioners As a consequence the effectiveness of the improved algorithm is clearly established Finally our conclusions are presented in Section 5 2 Derivation of the Method and Preconditioned Algorithm of the BiCG Method In this section we derive the method from the BiCG method and introduce the preconditioned BiCG algo- 1 This improved P was proposed in Ref [7] from a viewpoint of deriving process s logic but this manuscript demonstrates some mathematical theorems underlying the deriving process s logic Further numerical results using ILU( preconditioner only are shown in Ref [7] but the results using a variety of typical preconditioners are shown in this manuscript 139

S Itoh M Sugihara rithm 21 Structure of the BiCG Method BiCG [2] [3] is an iterative method for linear systems in which the coefficient matrix A is nonsymmetric The algorithm proceeds as follows: Algorithm 1 BiCG method: x is an initial guess r b Ax set 1 r r eg r r ( For 1 2 until convergence Do: End Do p r + p p r + p 1 1 1 1 α ( r r ( p Ap x x +α p + 1 T r+ 1 r αap r+ 1 r αa p ( + 1 + 1 ( r r r r (5 (6 BiCG implements the following Theorems Theorem 1 (Hestenes et al [8] Sonneveld [1] and others 2 For a non-preconditioned system there are re- R λ and probing direction polynomi- P λ These are currence relations that define the degree of the residual polynomial al ( λ ( λ R 1 P 1 (7 ( λ ( λ α λ ( λ R R P (8 1 1 1 ( λ ( λ + ( λ P R P (9 1 1 where α and are based on (5 and (6 Using the polynomials of Theorem 1 the residual vector for the linear system (1 and the shadow residual vector for the shadow system (2 can be written as These probing direction vectors are represented by r r A r R (1 T A r p p R (11 r A P (12 P r (13 T A T where r b Ax is the initial residual vector and r b A x is the initial shadow residual vector T However in practice r is set in such a way as to satisfy the conditions ( r r ( uv uv denotes the inner product between vectors u and v In this paper we set r r 2 Of course [8] discuss the conjugate gradient method for a symmetric system r r to satisfy the conditions ( 1391

S Itoh M Sugihara exactly This is a typical way of ensuring r ; other settings are beyond the scope of this paper Theorem 2 (Fletcher [2] The BiCG method satisfies the following conditions: 22 Derivation of the Method ( i j ( i j r r bi-orthogonality conditions (14 ( i A j ( i j p p bi-conjugacy conditions (15 The method is derived by transforming the scalar coefficients in the BiCG method to avoid the matrix [1] The polynomial defined by (1-(13 is substituted into (14 and (15 which construct the numerator and denominator of α and in BiCG 3 Then ( A A ( A r r R r R r r R r (16 T 2 p Ap P A r AP A r r AP A r (17 T 2 and the following theorem can be applied Theorem 3 (Sonneveld [1] The coefficients α and are equivalent to these coefficients in BiCG under certain transformation and substitution operations based on the bi-orthogonality and bi-conjugacy conditions Proof We apply to (16 and (17 Then α BiCG BiCG ( A ( A r R r p P r (18 2 2 ( R ( A P ( R ( A ( R ( A 2 r r r r r r α p Ap p A A r r Ap 2 2 r+ 1 r+ 1 r + 1 r r r+ 1 r r r r r r 2 The method is derived from BiCG by Theorem 4 Theorem 4 (Sonneveld [1] The method is derived from the linear system s recurrence relations in the BiCG method under the property of equivalence between the coefficients α in and BiCG However the solution vector x is derived from a recurrence relation based on r b Ax Proof The coefficients α and in BiCG and were derived in Theorem 3 The recurrence relations (8 and (9 for BiCG are squared to give: 2 ( λ ( 1( λ α 1λ 1( λ 2 2 2 2 ( λ α λ ( λ ( λ α λ ( λ R R P 2 R 2 P R + P 1 1 1 1 1 1 2 ( λ ( ( λ + 1 1( λ 2 2 2 ( λ ( λ ( λ ( λ P R P 2 R + 2 P R + P 1 1 1 1 2 2 Further we can apply r R ( A r p P ( A r from (18 and substitute u ( A ( A r P ( A R ( A r q + 1 Thus we have derived the method 4 3 BiCG BiCG In this paper if we specifically distinguish this algorithm we write α and 4 In this paper we use the superscript alongside α and relevant vectors to describe this algorithm T A (19 (2 (21 (22 P R 1392

S Itoh M Sugihara Algorithm 2 method: x is an initial guess r b Ax set 1 ( r r eg r r For 12 until convergence Do: End Do u r q + 1 1 ( ( ( r Ap p u + q + p 1 1 1 1 α r r α A q u p x + x + α u + q 1 r r u q + 1 α A + ( + 1 ( r r r r The following Proposition 5 and Corollary 1 are given as a supplementary explanation for Algorithm 2 These are almost trivial but are comparatively important in the next section s discussion Proposition 5 There exist the following relations: r r (23 r r (24 where r is the initial residual vector at in the iterative part of r is the initial shadow residual vector in r is the initial residual vector and r is the initial shadow residual vector 2 Proof Equation (23 follows because (18 for gives r R ( A r r Here R ( A I by (7 and I denotes the identity matrix Equation (24 is derived as follows Applying (18 to (16 we obtain ( R A R A R ( A r r r r r r r r T 2 This equation shows that the inner product of the on the right is obtained from the inner product of the T BiCG on the left Therefore r that composes the polynomial R ( A r to express the shadow residual vector of the BiCG is the same as r in Hereafter r can be represented by r to the extent that neither can be distinguished Corollary 1 There exists the following relation: p r r where p is the probing direction vector in r is the initial residual vector at in the iterative part of and r is the initial residual vector 23 Derivation of Preconditioned BiCG Algorithm In this subsection the preconditioned BiCG algorithm is derived from the non-preconditioned BiCG method (Algorithm 1 First some basic aspects of the BiCG method under a preconditioned system are expressed and a standard preconditioned BiCG algorithm is given When the BiCG method (Algorithm 1 is applied to linear equations under a preconditioned system: 1393

S Itoh M Sugihara A x b (25 we obtain a BiCG method under a preconditioned system (Algorithm 3 We denote this as In this paper matrices and vectors under the preconditioned system are denoted with such as A The coefficients α are specified by α Algorithm 3 BiCG method under the preconditioned system: x is an initial guess r b A x set 1 ( r r eg r r For 1 2 until convergence Do: End Do p r + p p r + p 1 1 1 1 α ( r r ( p A p x x p +α + 1 r r α A p r r α A p T + 1 + 1 ( 1 + + 1 ( r r r r p 5 (26 (27 We now state Theorem 6 and Theorem 7 which are clearly derived from Theorem 1 and Theorem 2 respectively Theorem 6 Under the preconditioned system there are recurrence relations that define the degree of the residual polynomial R λ and probing direction polynomial P R( λ 1 P( λ 1 R ( λ R ( λ α λp ( λ P ( λ R ( λ + 1 P 1 ( λ λ 6 These are (28 (29 1 1 1 (3 where λ is the variation under the preconditioned system and α in these relations are based on (26 and (27 Using the polynomials of Theorem 6 the residual vectors of the preconditioned linear system (25 and the shadow residual vectors of the following preconditioned shadow system: can be represented as respectively The probing direction vectors are given by T A x b (31 ( A r r R (32 ( A r (33 T R r ( A r p P (34 5 If we wish to emphasize different methods a superscript is applied to the relevant vectors to denote the method such as 6 We represent the polynomials R and P in italic font to denote a preconditioned system p BiCG p 1394

( A T S Itoh M Sugihara p P r (35 respectively Under the preconditioned system r r In this paper we set r based on the equation r r to satisfy the conditions ( r r exactly Some variations based on ( r r such as Algorithm 4 below are allowed Other settings are beyond the scope of this paper Remar 1 The shadow systems given by (31 do not exist but it is very important to construct systems in which the transpose of the matrix A exists Theorem 7 The BiCG method under the preconditioned system satisfies the following conditions: r is set in such a way as to satisfy the conditions ( ri rj ( i j bi-orthogonality conditions under the preconditioned system (36 ( pi A pj ( i j bi-conjugacy conditions under the preconditioned system (37 Next we derive the standard algorithm Here the preconditioned linear system (25 and its shadow system (31 are formed as follows: M AM M x M b (38 1 1 1 L R R L M AM M x M b (39 T T T T T R L L R Definition 1 On the subject of the algorithm the solution vector is denoted as x Furthermore the residual vectors of the linear system and shadow system of the algorithm are written as r and r respectively and their probing direction vectors are p and p respectively Using this notation each vector of the BiCG under the preconditioned system given by Algorithm 3 is converted as below: r M r r M r p M p p M p (4 1 T T L R R L Substituting the elements of (4 into (36 and (37 we have T 1 1 ( ri rj ( MR ri ML rj ( ri M rj (41 ( ( p Ap M p M AM M p p Ap (42 Consequently (26 and (27 become T 1 1 i j L i L R R j i j α ( ( p A p ( M ( p Ap r r r r 1 ( ( ( M ( M r r r r r r r r 1 + 1 + 1 + 1 + 1 1 Before the iterative step we give the following Definition 2 Definition 2 For some preconditioned algorithms the initial residual vector of the linear system is written as P P r and the initial shadow residual vector of the shadow system is written as r before the iterative step We adopt the following preconditioning conversion after (4 1 P T P L R (43 (44 r M r r M r (45 Consequently we can derive the following standard algorithm [9] [1] Algorithm 4 Standard preconditioned BiCG algorithm: P x is an initial guess r b Ax set 1 1395

S Itoh M Sugihara P 1 P ( r r ( r M r eg r r P 1 P M For 1 2 until convergence Do: 1 M + 1 1 p r p End Do T M + 1 1 p r p α 1 ( M ( p Ap r r x + x +α p 1 + 1 α A (46 r r p (47 r r p (48 T + 1 α A 1 ( r+ 1 M r+ 1 1 ( r M r (49 P P Algorithm 4 satisfies r r and r r when in the iterative part Remar 2 Because we apply a preconditioning conversion such as x M Rx in the iteration of the BiCG method under the preconditioned system (Algorithm 3 x x when in the iteration of Algorithm 4 Further the initial solution to Algorithm 3 is technically x but this is actually calculated by multiplying M R by the initial solution x In this section we have shown that α in BiCG is equivalent to α in using (19 and that in BiCG is equivalent to in using (2 In the next section we propose an improved P algorithm by applying this result to the preconditioned system 3 Improved P Algorithm In this section we first explain the derivation of P and present the conventional P algorithm We identify an issue with this conventional P algorithm and propose an improved P that overcomes this issue 31 Derivation of P Algorithm Figure 1 illustrates the logical structure of the solving methods and preconditioned algorithms discussed in this paper BiCG Derivation Preconditioning conversion α α BiCG BiCG α α P P Preconditioning conversion Conventional P Derivation Improved P α α Our Proposal P P Figure 1 Relation between BiCG and their preconditioned algorithms 1396

S Itoh M Sugihara Typically P algorithms are derived via a method under a preconditioned system (Algorithm 5 Algorithm 5 is derived by applying the method (Algorithm 2 to the preconditioned linear system (25 In this section the vectors and α are P elements except for those before the iteration (Definition 2 P If we wish to emphasize different methods we apply a superscript to the relevant vectors such as α p p P P Algorithm 5 method under the preconditioned system: x is an initial guess r b A P x set 1 r r eg r r ( For 1 2 until convergence Do: End Do u r + q P 1 1 P P ( ( r r ( r A p p u q p + 1 1+ 1 1 α P q u α A p P x x + u + q + P 1 α r r u + q + P 1 α A P ( + 1 ( r r r r The conventional P algorithm (Algorithm 6 is derived via the method as shown in Figure 1 but this algorithm does not reflect the logic of subsection 22 in its preconditioning conversion In contrast our proposed improved P algorithm (Algorithm 7 directly applies the derivation from BiCG to to the algorithm thus maintaining the logic from subsection 22 32 Conventional P and Its Issue The conventional P algorithm is adopted in many documents and numerical libraries [4] [9] [11] It is derived by applying the following preconditioning conversion to Algorithm 5: 1 1 1 1 A M L AM R x M Rx b M L b r M L r (5 T 1 1 1 r M r p M p u M u q M q L L L L This gives the following Algorithm 6 ( Conventional P in Figure 1 Algorithm 6 Conventional P algorithm: x is an initial guess r b Ax r r eg r r set 1 ( For 12 until convergence Do: u r q + 1 1 ( p u q p + 1 1+ 1 1 α ( r r 1 ( r AM p (51 1397

S Itoh M Sugihara End Do q u α AM p 1 x x u q + 1 1 αm + + r r u q 1 1 αam + + ( + 1 ( r r r r (52 This P algorithm was described in [4] which proposed the Bitab method and has been employed as a standard approach as affairs stand now This version of P seems to be a compliant algorithm on the surface because the operation ( r r in 1 (51 and (52 does not include the preconditioning operator M 1 under the conversions r M L r and T T r M L r from (5 However if we apply r M L r from (5 to the preconditioning conversion of the shadow residual vector in the BiCG method we obtain r (53 T M Lr This is different to the conversion given by (4 and we cannot obtain equivalent coefficients to in (43 and (44 using (53 33 Derivation of the Method from α In this subsection we present an improved P algorithm ( Improved P in Figure 1 We formulate this algorithm by applying the derivation process to the BiCG method directly under the preconditioned system ( Algorithm 3 The polynomials (32-(35 of the residual vectors and the probing direction vectors in are substituted for the numerators and denominators of α and We have ( R A R A ( R ( A r r r r r r (54 BiCG BiCG T 2 ( ( p A p P A r AP A r r AP A r (55 BiCG BiCG T 2 and apply the following Theorem 8 Theorem 8 The P coefficients α and are equivalent to these coefficients in under certain transformation and substitution operations based on the bi-orthogonality and bi-conjugacy conditions under the preconditioned system Proof We apply to (54 and (55 Then α ( ( r R A r p P A r (56 2 2 BiCG BiCG ( r r ( r r BiCG BiCG ( p A p ( r A p BiCG BiCG ( r + 1 r + 1 ( r r + 1 BiCG BiCG ( r r ( r r α The P method is derived from using Theorem 9 Theorem 9 The P method is derived from the linear system s recurrence relations in the method under the property of equivalence between the coefficients α in P and However the solu- P P (57 (58 1398

S Itoh M Sugihara tion vector x is derived from a recurrence relation based on r b A x Proof The coefficients α and in and P were derived in Theorem 8 The recurrence relations in (29 and (3 for are squared to give: 2 ( 1 1 1 1 1 1 1 1 1 R λ R λ α λp λ R λ 2 α λp λ R λ + α λ P λ (59 2 2 2 2 2 2 ( 1 1 1 1 1 1 P λ R λ + P λ R λ + 2 P λ R λ + P λ (6 2 2 2 2 Further we can apply r 2 2 R A r p P ( A r from (56 and substitute u P ( A R ( A P ( A R + 1( A r r q The following Proposition 1 and Corollary 2 are given as a supplementary explanation under the preconditioned system Proposition 1 There exist the following relations: r (61 r r (62 r where r is the initial residual vector at in the iterative part of P r is the initial shadow residual vector in P r is the initial residual vector and r is the initial shadow residual vector Proof Equation (61 follows because (56 for gives r 2 R ( A r r Here R ( A I by (28 Equation (62 is derived as follows Applying (56 to (54 we obtain ( R A R A ( R ( A ( r r r r r r r r BiCG BiCG T 2 This equation shows that the inner product of the P on the right is obtained from the inner product of the T on the left Therefore r that composes the polynomial R ( A r to express the shadow residual vector of the is the same as r in P Hereafter r can be represented by r to the extent that neither can be distinguished Corollary 2 There exists the following relation: p r r where p is the probing direction vector in P r is the initial residual vector at in the iterative part of P and r is the initial residual vector The preconditioning conversion given by r p and r is subjected to the same treatment as the BiCG preconditioning conversion of r p in (4 and (45 Further r is the same as r Then α P P ( ( ( P 1 P ( r M Ap r r M r M r r M r r A p M r M AM M p T 1 P 1 P R L T 1 1 P R L R R ( GS ( GS r r M r M r r M r r r M r M r r M r C T 1 P 1 P + 1 R L + 1 + 1 C T 1 P 1 P R L As a consequence the following improved P algorithm is derived 7 Algorithm 7 Improved preconditioned algorithm: x is an initial guess r b Ax 1 ( r r eg r M r set 1 For 1 2 until convergence Do: 7 We apply the superscript P to α u r + q 1 M 1 1 and relevant vectors to denote this algorithm (63 (64 1399

S Itoh M Sugihara End Do ( p u q p + 1 1+ 1 1 α 1 ( r M r 1 ( r M Ap q u α M Ap 1 x x u q + 1 α + + 1 αa r r u + q + 1 ( M + 1 1 ( r M r r r Algorithm 7 can also be derived by applying the following preconditioning conversion to Algorithm 5 Here we treat the preconditioning conversions of u and q the same as the conversion of p A M AM x M x b M b r M r 1 1 1 1 L R R L L r M r p M p u M u q M q T R R R R The number of preconditioning operations in the iterative part of Algorithm 7 is the same as that in Algorithm 6 4 Numerical Experiments In this section we compare the conventional and improved P algorithms numerically The test problems were generated by building real unsymmetric matrices corresponding to linear systems taen from the Tim Davis collection [12] and the Matrix Maret [13] The RHS vector b of (1 was generated by setting all elements of the exact solution vector x exact to 1 and substituting this into (1 The solution algorithm was implemented using the sequential mode of the Lis numerical computation library (version 112 [14] in double-precision with the compiler options registered in the Lis Maefile Furthermore we set the initial solution to x and considered the algorithm to have converged when 12 r b 1 1 (where 2 2 r is the residual vector in the algorithm and is the iteration number The maximum number of iterations was set to the size of the coefficient matrix The numerical experiments were executed on a DELL Precision T74 (Intel Xeon E542 25 GHz CPU 16 GB RAM running the Cent OS (ernel 2618 and the Intel icc 11 compiler The results using the non-preconditioned are listed in Table 1 The results given by the conventional P and the improved P are listed in Tables 2-5 Each table adopts a different preconditioner in Lis [14]: (Point-Jacobi ILU( 8 SAINV and Crout ILU In these tables significant advantages of one algorithm over the other are emphasized by bold font 9 Additionally matrix names given in italic font in Table 1 encounter some difficulties The evolution of the convergence for each preconditioner is shown in Figures 2-5 In this paper we do not compare the computational speed of these preconditioners In many cases the results given by the improved P are better than those from the conventional algorithm We should pay particular attention to the results from matrices mcca mcfe and watt_1 In these cases it appears that the conventional P converges faster with any preconditioner but the TRE values are worse than those from the improved algorithm The iteration number for the conventional P is not emphasized by bold font in these instances The consequences of this anomaly are worth investigating further possibly by analyzing them under This will be the subject of future wor 8 Values of zero for ILU( indicates the fill-in level that is no fill-in 9 Because the principal argument presented in this paper concerns the structure of the algorithm for P based on an improved preconditioning transformation procedure the time required to obtain numerical solutions and the convergence history graphs (Figures 2-5 are provided as a reference As mentioned before there is almost no difference between Algorithm 6 and Algorithm 7 as regards the number of operations in the iteration which would exert a strong influence on the computation time 14

S Itoh M Sugihara Table 1 Numerical results for a veriety of test problems ( Matrix N NNZ (Algorithm 2 Iter TRR TRE Time arc13 13 137 11 122 75 118e 4 bfwa782 782 7514 32 1129 1194 171e 2 cryg25 25 12349 No convergence epb1 14734 9553 77 745 65 593e 1 jpwh_991 991 627 Breadown mcca 18 2659 No convergence mcfe 765 24382 No convergence memplus 17758 99147 1334 916 676 133e+ olm1 1 3996 No convergence olm5 5 19996 No convergence pde9 9 438 113 987 149 413e 3 pde2961 2961 14585 256 949 119 36e 2 sherman2 18 2394 No convergence sherman3 55 233 No convergence sherman5 3312 2793 1927 136 969 334e 1 viscoplastic2 32769 381326 81 134 818 22e+ watt_1 1856 1136 36 121 67 272e 2 In this table N is the problem size and NNZ is the number of nonzero elements The items in each column are from left to right the number of iterations required to converge (denoted Iter the true relative residual log 1 2-norm (denoted by TRR calculated as b Aˆ x b where 2 2 x ˆ is the numerical solution the true relative error log 1 2-norm (denoted by TRE calculated from the numerical solution and the exact solution ˆ exact exact that is x x x and as a reference the CPU time (denoted Time [s] Several matrix names are represented in italic font These 2 2 describe certain situations such as Breadown No convergence and insufficient accuracy on TRR or TRE 5 Conclusions In this paper we have developed an improved P algorithm by applying the procedure for deriving to the BiCG method under a preconditioned system and we also have presented some mathematical theorems underlying the deriving process s logic The improved P does not increase the number of preconditioning operations in the iterative part of the algorithm Our numerical results established that solutions obtained with the proposed algorithm are superior to those from the conventional algorithm for a variety of preconditioners However the improved algorithm may still brea down during the iterative procedure This is an artefact of certain characteristics of the non-preconditioned BiCG and methods mainly the operations based on the bi-orthogonality and bi-conjugacy conditions Nevertheless this improved logic can be applied to other bi- Lanczos-based algorithms with minimal residual operations In future wor we will analyze the mechanism of the conventional and improved P algorithms and consider other variations of this algorithm Furthermore we will consider other settings of the initial shadow residual vector r except for r r to ensure that ( r r holds 141

S Itoh M Sugihara Table 2 Numerical results for a veriety of test problems (Jacobi- Conv P (Algorithm 6 Impr P (Algorithm 7 Matrix Iter TRR TRE Time Iter TRR TRE Time arc13 5 1591 161 688e 5 5 1587 166 728e 5 bfwa782 227 115 123 125e 2 26 1198 122 141e 2 cryg25 No convergence No convergence epb1 578 766 644 456e 1 591 759 686 468e 1 jpwh_991 Breadown Breadown mcca 84 653 93 144e 3 12 893 1261 26e 3 mcfe 764 356 297 795e 2 98 626 879 954e 2 memplus 213 121 96 219e 1 23 1241 938 237e 1 olm1 No convergence No convergence olm5 No convergence No convergence pde9 113 744 763 43e 3 1 1122 1175 381e 3 pde2961 26 827 868 255e 2 237 644 661 292e 2 sherman2 No convergence No convergence sherman3 112 73 84 215e 1 827 674 759 175e 1 sherman5 131 1229 1263 235e 2 128 124 123 228e 2 viscoplastic2 66 999 8 197e+ 645 1245 114 188e+ watt_1 8 1251 575 753e 3 79 1261 544 753e 3 Figue 2 Converging histories of relative residual 2-norm (Jacobi- Red line: Conventional P Blue line: Improved P 142

S Itoh M Sugihara Table 3 Numerical results for a veriety of test problems (ILU(- Conv P (Algorithm 6 Impr P (Algorithm 7 Matrix Iter TRR TRE Time Iter TRR TRE Time arc13 2 159 635 294e 4 3 1626 117 298e 4 bfwa782 93 936 129 118e 2 78 1282 1248 11e 2 cryg25 No convergence 385 847 422 849e 2 epb1 124 1138 134 214e 1 129 923 854 229e 1 jpwh_991 Breadown 16 1244 1253 272e 3 mcca 7 153 111 599e 4 7 998 117 618e 4 mcfe 1 1283 1158 57e 3 9 1215 161 552e 3 memplus 33 1213 136 722e 1 35 1212 161 718e 1 olm1 No convergence 34 1249 919 285e 3 olm5 No convergence 34 122 85 141e 2 pde9 27 1361 1427 257e 3 27 1319 1392 259e 3 pde2961 53 157 1134 149e 2 58 1178 1265 162e 2 sherman2 12 136 1146 68e 3 11 142 1155 591e 3 sherman3 13 982 1157 439e 2 96 182 1334 41e 2 sherman5 31 1368 1289 136e 2 3 1254 1242 131e 2 viscoplastic2 812 755 468 71e+ 844 118 869 718e+ watt_1 27 131 596 617e 3 35 1211 977 775e 3 Figue 3 Converging histories of relative residual 2-norm (ILU(- Red line: Conventional P Blue line: Improved P 143

S Itoh M Sugihara Table 4 Numerical results for a veriety of test problems (SAINV- Conv P (Algorithm 6 Impr P (Algorithm 7 Matrix Iter TRR TRE Time Iter TRR TRE Time arc13 4 1754 875 322e 4 5 1927 112 324e 4 bfwa782 19 949 975 172e 2 16 1234 127 173e 2 cryg25 No convergence No convergence epb1 69 124 1191 29e+ 69 1231 1241 289e+ jpwh_991 Breadown 41 122 136 737e 3 mcca 81 532 226e 3 111 86 1426 34e 3 mcfe 764 356 297 11e 1 98 626 879 129e 1 memplus 34 123 12 118e+ 34 1232 124 12e+ olm1 No convergence No convergence olm5 No convergence No convergence pde9 53 137 1116 725e 3 51 1243 133 728e 3 pde2961 11 1162 122 691e 2 96 128 1255 666e 2 sherman2 No convergence No convergence sherman3 874 749 834 553e 1 629 949 175 424e 1 sherman5 157 122 1119 92e 2 127 1257 123 89e 2 viscoplastic2 No convergence No convergence watt_1 1 1237 45 149e+ 2 1481 1131 153e+ Figure 4 Converging histories of relative residual 2-norm (SAINV- Red line: Conventional P Blue line: Improved P 144

S Itoh M Sugihara Table 5 Numerical results for a veriety of test problems (CroutILU- Matrix Conv P (Algorithm 6 Impr P (Algorithm 7 Iter TRR TRE Time Iter TRR TRE Time arc13 2 1242 234 163e 4 4 1626 1112 182e 4 bfwa782 122 129 125 23e 2 149 1256 1315 242e 2 cryg25 No convergence 92 76 267 29e 1 epb1 19 1191 1169 212e 1 16 121 1179 26e 1 jpwh_991 Breadown 15 1253 1329 346e 3 mcca 1 975 142 59e 4 12 91 1541 64e 4 mcfe 23 1171 136 624e 3 26 1173 1399 682e 3 memplus 92 126 1 159e 1 88 1252 157 153e 1 olm1 No convergence 38 1224 84 337e 3 olm5 67 97 793 275e 2 75 1164 912 36e 2 pde9 13 1256 1266 196e 3 12 1242 1243 188e 3 pde2961 24 1313 1316 918e 3 25 1248 1277 95e 3 sherman2 91 972 624 219e 1 841 161 871 25e 1 sherman3 79 165 121 394e 2 74 1168 139 373e 2 sherman5 4 134 1195 153e 2 4 133 1293 153e 2 viscoplastic2 No convergence No convergence watt_1 19 135 575 663e 3 25 1525 943 812e 3 Figure 5 Converging histories of relative residual 2-norm (CroutILU- Red line: Conventional P Blue line: Improved P 145

S Itoh M Sugihara Acnowledgements This wor is partially supported by a Grant-in-Aid for Scientific Research (C No 2539145 from MEXT Japan References [1] Sonneveld P (1989 A Fast Lanczos-Type Solver for Nonsymmetric Linear Systems SIAM Journal on Scientific and Statistical Computing 1 36-52 http://dxdoiorg/11137/914 [2] Fletcher R (1976 Conjugate Gradient Methods for Indefinite Systems In: Watson G Ed Numerical Analysis Dundee 1975 Lecture Notes in Mathematics Vol 56 Springer-Verlag Berlin New Yor 73-89 [3] Lanczos C (1952 Solution of Systems of Linear Equations by Minimized Iterations Journal of Research of the National Bureau of Standards 49 33-53 http://dxdoiorg/1628/jres496 [4] Van der Vorst HA (1992 Bi-TAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems SIAM Journal on Scientific and Statistical Computing 13 631-644 http://dxdoiorg/11137/91335 [5] Zhang S-L (1997 GPBi-CG: Generalized Product-Type Methods Based on Bi-CG for Solving Nonsymmetric Linear Systems SIAM Journal on Scientific Computing 18 537-551 http://dxdoiorg/11137/s164827592236313 [6] Itoh S and Sugihara M (21 Systematic Performance Evaluation of Linear Solvers Using Quality Control Techniques In: Naono K Teranishi K Cavazos J and Suda R Eds Software Automatic Tuning: From Concepts to State-of-the-Art Results Springer 135-152 [7] Itoh S and Sugihara M (213 Preconditioned Algorithm of the Method Focusing on Its Deriving Process Transactions of the Japan SIAM 23 253-286 (in Japanese [8] Hestenes MR and Stiefel E (1952 Methods of Conjugate Gradients for Solving Linear Systems Journal of Research of the National Bureau of Standards 49 49-435 http://dxdoiorg/1628/jres4944 [9] Barrett R Berry M Chan TF Demmel J Donato J Dongarra J et al (1994 Templates for the Solution of Linear Systems: Building Blocs for Iterative Methods SIAM http://dxdoiorg/11137/19781611971538 [1] Van der Vorst HA (23 Iterative Krylov Methods for Large Linear Systems Cambridge University Press Cambridge http://dxdoiorg/1117/cbo978511615115 [11] Meurant G (25 Computer Solution of Large Linear Systems Elsevier Amsterdam [12] Davis TA The University of Florida Sparse Matrix Collection http://wwwciseufledu/research/sparse/matrices/ [13] Matrix Maret Project http://mathnistgov/matrixmaret/ [14] SSI Project Lis http://wwwssiscorg/lis/ 146