College of William & Mary Department of Computer Science

Similar documents
Deflation for inversion with multiple right-hand sides in QCD

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

Iterative Methods for Linear Systems of Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Iterative Methods for Sparse Linear Systems

DELFT UNIVERSITY OF TECHNOLOGY

FEM and sparse linear system solving

The Lanczos and conjugate gradient algorithms

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

Progress Report for Year 2. College of William and Mary Algorithms and software

IDR(s) as a projection method

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Iterative methods for Linear System

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices

State-of-the-art numerical solution of large Hermitian eigenvalue problems. Andreas Stathopoulos

Krylov Space Solvers

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Fast iterative solvers

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Introduction to Iterative Solvers of Linear Systems

6.4 Krylov Subspaces and Conjugate Gradients

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

Computational Linear Algebra

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

The Removal of Critical Slowing Down. Lattice College of William and Mary

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

7.2 Steepest Descent and Preconditioning

arxiv: v1 [hep-lat] 2 May 2012

M.A. Botchev. September 5, 2014

Contents. Preface... xi. Introduction...

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture 3: Inexact inverse iteration with preconditioning

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Linear Solvers. Andrew Hazel

Iterative Methods for Solving A x = b

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

S-Step and Communication-Avoiding Iterative Methods

Multigrid absolute value preconditioning

Conjugate Gradient Method

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

1 Conjugate gradients

arxiv: v1 [hep-lat] 19 Jul 2009

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Preconditioned inverse iteration and shift-invert Arnoldi method

Comparing iterative methods to compute the overlap Dirac operator at nonzero chemical potential

LARGE SPARSE EIGENVALUE PROBLEMS

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Finite-choice algorithm optimization in Conjugate Gradients

Domain decomposition on different levels of the Jacobi-Davidson method

Krylov Subspaces. Lab 1. The Arnoldi Iteration

Chapter 7 Iterative Techniques in Matrix Algebra

Lab 1: Iterative Methods for Solving Linear Systems

Arnoldi Methods in SLEPc

Lecture 8 Fast Iterative Methods for linear systems of equations

Computational Linear Algebra

Algorithms that use the Arnoldi Basis

Preface to the Second Edition. Preface to the First Edition

Iterative methods for positive definite linear systems with a complex shift

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Methods I Non-Square and Sparse Linear Systems

Inexact inverse iteration with preconditioning

Efficient implementation of the overlap operator on multi-gpus

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator

Algebraic Multigrid as Solvers and as Preconditioner

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

1 Extrapolation: A Hint of Things to Come

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Fast iterative solvers

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Some minimization problems

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Course Notes: Week 1

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Variants of BiCGSafe method using shadow three-term recurrence

PERTURBED ARNOLDI FOR COMPUTING MULTIPLE EIGENVALUES

Numerical Methods I Eigenvalue Problems

TWO-GRID DEFLATED KRYLOV METHODS FOR LINEAR EQUATIONS

MULTIGRID ARNOLDI FOR EIGENVALUES

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

4.6 Iterative Solvers for Linear Systems

Downloaded 07/31/12 to Redistribution subject to SIAM license or copyright; see

Structured Krylov Subspace Methods for Eigenproblems with Spectral Symmetries

Transcription:

Technical Report WM-CS-2009-06 College of William & Mary Department of Computer Science WM-CS-2009-06 Extending the eigcg algorithm to non-symmetric Lanczos for linear systems with multiple right-hand sides Abdou M. Abdel-Rehim, Andreas Stathopoulos, Kostas Orginos July 20, 2009

EXTENDING THE EIGCG ALGORITHM TO NON-SYMMETRIC LANCZOS FOR LINEAR SYSTEMS WITH MULTIPLE RIGHT-HAND SIDES ABDOU M. ABDEL-REHIM, ANDREAS STATHOPOULOS, AND KOSTAS ORGINOS Abstract. The technique that was used to build the eigcg algorithm for symmetric linear systems is extended to the non-symmetric case using the BICG and QMR algorithms. We show that, similar to the symmetric case, we can build an algorithm that is capable of computing some eigenvalues of the non-symmetric system using only a window of the BICG residuals and simultaneously solving the linear system. The new algorithm, called eigbicg/qmr, is tested on non-symmetric matrices from the Matrix Market and Lattice QCD using MATLAB. We show that with a moderate memory requirement, eigbicg/qmr is capable of simultaneously solving the linear system and computing few eigenvalues as well as left and right eigenvectors with similar accuracy to the un-restarted bi-lanczos algorithm. For non-symmetric linear systems with multiple right-hand sides, we give an algorithm, similar to the symmetric case, that solves the systems in two phases. In the first phase, a subset of the right-hand sides are solved using eigbicg/qmr where the accuracy of the computed eigenvectors is improved from one system to the next and new eigenvectors are computed. In the second phase, the computed eigenvectors are used to deflate BICGSTAB and accelerate the solution of the remaining right-hand sides. Deflation using both Left-Right and Galerkin projections are studied. The algorithm for multiple right-hand sides, called Incremental eigbicg/qmr, is shown to speed up BICGSTAB by a factor of 2-3 on the test matrices used. Removal of critical slowing down in lattice QCD applications is discussed and compared to the symmetric case. We also give a version of the algorithm for J-symmetric matrices for which matrix-vector multiplication with A H (where A H is the Hermitian conjugate of A)is avoided and the storage of left eigenvectors is not needed. 1. Introduction. Many scientific and engineering applications require the solution of linear systems of equations of the form: Ax i = b i i = 1, 2,..., nrhs, (1.1) for some large sparse matrix A. Krylov subspace iterative methods are fundamental tools to solve such systems. Traditional approaches solve each system one by one. Unknowingly, such methods regenerate search directions within previously explored subspaces, thus wasting iterations in successive right hand sides. Sharing information between systems has long been recognized as the key idea to speed up the solution of such systems [1]. Different approaches has been introduced in the literature. These include block methods [2]-[4], seed methods [5]-[7], and deflation methods [8]-[12]. Block methods build a Krylov subspace for all the right hand sides and work on them at once. Seed methods use the Krylov subspace built while solving the current righthand side as a projection space for subsequent right hand sides while deflation methods use information about the spectrum of the matrix A between successive right hand sides. For ill conditioned problems, the slow convergence of standard krylov methods is mainly due to small eigenvalues of A and deflation methods are very efficient for such systems. The algorithms we study here belong to deflation methods. For symmetric positive definite (SPD) matrix A, we recently proposed a new algorithm based on the Conjugate Gradient (CG) algorithm that simultaneously solve This work was supported by the National Science Foundation grant CCF-0728915, the DOE Jefferson Lab and the Jeffress Memorial Trust grant J-813. Departments of Physics and Computer Science, College of William and Mary, Williamsburg, Virginia 23187-8795 (amrehim@cs.wm.edu). Department of Computer Science, College of William and Mary, Williamsburg, Virginia 23187-8795 (andreas@cs.wm.edu). Department of Physics, College of William and Mary, Williamsburg, Virginia 23187-8795 (kostas@jlab.org). 2

the linear systems, compute eigenvalues and apply deflation to speed up the solution for multiple right-hand sides [13]. The new algorithm was called Incremental eigcg and it has many advantages. First, it simultaneously solves the linear system and computes few small eigenvalues without affecting the convergence of the linear solver, thus, saving the need to call a separate eigensolver to get the eigenvectors that will be used for deflation and also avoiding restarting CG for the linear system which would slow its convergence. Second, the algorithm computes the eigenvalues in a memory efficient way by using only a small size window of the CG residuals to compute the eigenvalues and restart this window by combining current approximate eigenvectors and new CG residuals to obtain a better approximation of the eigenvectors. Third, the number and the accuracy of the computed eigenvectors could be increased while solving more right-hand sides. Finally, the algorithm was built upon the CG algorithm which requires a three-term recurrence which means low memory cost and also has optimal convergence properties. Incremental eigcg was tested on various problems with different sizes and was found to give excellent results with large speed up for multiple right-hand sides and with eigenvalues accuracy comparable to un-restarted Lanczos. For lattice QCD, the algorithm was found to remove the critical slowing down problem which happens as quark masses approach zero. Non-symmetric problems are very common in many application and motivated by the success of Incremental eigcg for the symmetric case, we give in this report an extension for the case when A is non-symmetric. Our goal is to build an algorithm for the non-symmetric case going along the same lines as the symmetric case. For that purpose, we build our new algorithm based on the Bi-Conjugate Gradient (BICG) [1] and Quasi Minimal Residual (QMR) [14] algorithms. These algorithms uses a three term recurrence and it can compute both left and right eigenvectors simultaneously. Another option was to build the algorithm based on Bi-Conjugate-Gradient-Stabilized (BICGSTAB) [15]. The problem with this option is that even though one can relate the coefficients of the BICGSTAB algorithm to those of the BICG algorithm, it is not clear how to get the Bi-Lanczos vectors from the BICGSTAB residuals which is needed in order to compute eigenvectors. In addition to computing left and right eigenvectors in the non-symmetric case, two other differences also exist as compared to the symmetric case for systems with multiple right-hand sides. First, in the symmetric case, CG was used as the linear solver in both phases of the solution (solving linear systems and computing eigenvalues in the first phase and solving linear system with deflation in the second phase). In the non-symmetric case, BICG will be used in the first phase only and for the second phase it will be more efficient to deflate BICGSTAB as it is normally faster than BICG. The second difference is related to the projection used for deflation. In the symmetric case, orthogonal projection was used and it efficiently deflates small eigenvalues. In the non-symmetric case, BICGSTAB could be deflated using left-right or Galerkin projections as will be discussed. Non-symmetric linear systems and eigenvalue problems are generally harder to solve than SPD systems for different reasons. In general the eigenvalues are complex and matrices could be defective. If one is only interested in solving the linear system, then, in principle, a non-symmetric linear system Ax = b can be converted into the SPD system A H Ax = A H b which could be solved using Incremental eigcg. This procedure, however, has two disadvantages. First, every matrix-vector multiplication step will include a product with A, then a product with A H, thus doubling the cost of matrix-vector products per iteration. Second, if A has eigenvalues < 1, then multi- 3

plication with A H will roughly square those eigenvalues leading to a larger condition number of the symmetrized problem and make its solution harder. For multiple righthand sides however, if one is using deflation, then one has to compare the deflation efficiency in both situations. In one hand, non-symmetric eigenvalues will be harder to compute than the symmetric case and also one has to consider the integrated speed up we get after solving many right-hand sides. In this report, we compare solving nonsymmetric linear systems with multiple right-hand sides using the newly developed algorithm for the non-symmetric case directly with using Incremental eigcg applied to the symmetrized system for two lattice QCD examples. Solving the non-symmetric problem directly gives also another advantage over solving the symmetrized version in addition to computing left and right eigenvectors of A. This has to do with the situation when A is J-symmetric. This means that A satisfies the condition JA = A H J, (1.2) for a Hermitian matrix J such that J is a simple matrix and that multiplying J to a vector is much faster than multiplying A to a vector. This case is important in many applications [16] including for example lattice QCD with Wilson fermions [17] where A is the Wilson-Dirac matrix and J is the Dirac matrix γ 5. In this case, we can implement two important simplifications. First, the multiplication with A H which is used in BICG(QMR) will be replaced by a faster multiplication with J. Second, no storage is needed for the left eigenvectors as they are related to the right eigenvectors through J-symmetry. In this report, a J-symmetric version of our new algorithms for non-symmetric matrices will also be given. In the following, Ā,A T, A H denotes the complex conjugate, the transpose, and the Hermitian conjugate of A respectively. The dot product of two vectors w and v is denoted by < v, w >= w H v. (1.3) For a vector v, we use v for the 2 norm of v, i.e., v = < v, v >. 2. The non-symmetric Lanczos and related algorithms. In this section we review briefly the basics of the non-symmetric Lanczos and related algorithms for linear systems and eigenvalue problems. This will establish discussion for the eigbicg algorithm. 2.1. The non-symmetric Lanczos algorithm. Consider the linear system of equations Ax = b, (2.1) where A is n n complex non-hermitian matrix. For the non-symmetric Lanczos, we also have to consider the dual system of equations A H ˆx = ˆb. (2.2) Let x 0 and ˆx 0 be the initial guesses for the two systems. The initial residuals are r 0 = b Ax 0 and ˆr 0 = ˆb A H ˆx 0. It is almost always the case that one is interested in solving only the system Ax = b and in such case we take b = ˆb and the choice of ˆx 0 and consequently ˆr 0 is arbitrary. However, in the J-symmetric case, making a special 4

choice of ˆr 0 will lead to simplifications. The non-symmetric Lanczos algorithm builds biorthogonal basis for the Krylov Subspaces K m (A, v 1 ) = {v 1, Av 1, A 2 v 1,..., A m 1 v 1 } K H m (AH, w 1 ) = {w 1, A H w 1, (A H ) 2 w 1,..., (A H ) m 1 w 1 }, (2.3) where v 1 = r 0 / r 0 and w 1 = ˆr 0 / < ˆr 0, v 1 > using three-term recurrences. It is given in Algorithm 1. Building the biorthogonal basis requires the storage of six Algorithm 1 Non-symmetric Lanczos algorithm Set b 0 = g 0 = 0 and v 0 = w 0 = 0. for j = 1, 2,... till convergence do Compute Av j and A H w j. Set a j =< Av j, w j >. Compute ṽ j+1 = Av j a j v j b j 1 v j 1 and w j+1 = A H w j ā j w j g j 1 w j 1. Set g j = ṽ j+1 and v j+1 = ṽ j+1 /g j. Set b j =< v j+1, w j+1 > and w j+1 = w j+1 / b j. end for vectors per iteration. Also, every iteration requires a matrix-vector multiplication with A and another matrix-vector multiplication with A H. Let V m be the matrix with columns v 1, v 2,..., v m and W m be the matrix with columns w 1, w 2,..., w m are the biorthogonal basis i.e., W H m V m = I. (2.4) The projection matrix T m = Wm H AV m is a tridiagonal matrix given by: a 1 b 1 T (m) = Wm H AV. m = g..... 1...... bm 1. (2.5) g m 1 a m The solution of the linear system Eq.2.1 is given by x = x 0 + V m y m, (2.6) where y m is determined according to a certain condition. In the case of BICG, y m = y b m and is determined from the condition that the new residual rbicg = b Ax is orthogonal to W m, i.e. which gives ym b as the solution of the system W H m rbicg = 0, (2.7) T (m) y b m = ξe(m) 1, (2.8) where ξ = r 0 and e (m) 1 is a unit vector of dimension m given by e (m) 1 = [1, 0, 0,..., 0] T. In the case of QMR, y m = ym q is determined from solving the m+1 least-squares problem min ξe (m+1) ym q 1 T (m) ym q, (2.9) 5

where T (m) has dimensions (m+1) m and e (m+1) 1 is an m+1 dimensional unit vector which is the same as e (m) 1 except for an additional 0 in the (m + 1) th component. For non-symmetric eigenvalue problems, the Lanczos algorithm gives an approximation to the left and right eigenvectors using the Raleigh-Ritz procedure. In order to compute eigenvalues and eigenvectors using the Lanczos algorithm, we solve the m m eigenvalue problem T (m) Y (m) R T (m)h Y (m) L = Y (m) R Λ(m) = Y (m) L Λ (m), (2.10) where Λ (m) = diag(λ 1, λ 2,..., λ m ) are the Ritz values. The left and right Ritz vectors given by Z (m) R Z (m) L = V my (m) R = W m Y (m) L. (2.11) The non-symmetric Lanczos algorithm has the advantage of using short recurrence which requires the storage of only six vectors per iteration. However, as in the symmetric case, it is subject to two main problems. The first problem is instability because of possible breakdown when < v j+1, w j+1 >= 0 with v j+1 0 and w j+1 0. This problem is addressed by using Look-Ahead version of the algorithm [18]-[21]. The second problem is the loss of bi-orthogonality between V m and W m due to roundoff errors. As discussed in Paige s theorem for the symmetric Lanczos [22], it is closely related to the convergence of some eigenvalues to the square-root of the machine precision. The loss of bi-orthogonality won t have serious effects on solving the linear system. However, it will effect eigenvector computation with the non-symmetric Lanczos. In particular, it will lead to the appearance of spurious eigenvalues and also the appearance of multiple copies of the same eigenvalue as they start to converge. The cure of the loss of bi-orthognality requires the application of some kind of re-orthogonalization [23]-[25]. This, however, requires extra storage requirements. Both of these problems deserve careful study but won t be addressed in this report but will be the subject of further study. Note that in order to find the solution of the linear system using the Lanczos algorithm using Equation 2.6, we need to store all the vectors V m. This, however, can be avoided using the BICG algorithm that will be described later. In contrast, the storage of all vectors in V m and W m are essential in case of solving for the left and right eigenvectors as given by Equation 2.11. In this report, we show how to compute eigenvectors using only a small size window of vectors and achieve accuracy that is comparable to the unrestarted Lanczos. 2.2. The BICG Algorithm. The BICG algorithm can be derived from the twosided Lanczos algorithm and the LDU-factorization of the tridiagonal matrix T (m) in the same manner CG was derived from the symmetric Lanczos algorithm. In this case we deal with the residual vectors r and ˆr of the systems in Eq.2.1 and Eq.2.2 respectively. The BICG algorithm is given in Algorithm 2 In the BiCG algorithm, we have the biorthogonality conditions < r k, ˆr l > = 0 for k l, < Ap k, ˆp l > = 0 for k l. (2.12) The Lanczos vectors V and W are not built explicitly or stored. Instead, the solution and residuals are updated in every iteration. The Lanczos vectors are colinear to the 6

Algorithm 2 The BICG algorithm Given x 0, compute r 0 = b Ax 0, and set p 0 = r 0. Choose ˆr 0 such that < r 0, ˆr 0 > 0, and set ˆp 0 = ˆr 0. Set ρ 0 =< r 0, ˆr 0 >. for j = 1, 2,... till convergence do Compute Ap j 1 and A H ˆp j 1. Compute σ j 1 =< Ap j 1, ˆp j 1 > and α j 1 = ρj 1 σ j 1. Set x j = x j 1 + α j 1 p j 1. Compute r j = r j 1 α j 1 Ap j 1 and ˆr j = ˆr j 1 ᾱ j 1 A H ˆp j 1. Compute ρ j =< r j, ˆr j > and β j 1 = ρj ρ j 1. Set p j = r j + β j 1 p j 1 and ˆp j = ˆr j + β j 1ˆp j 1. end for residuals as v j = η j r j 1, w j = ζ jˆr j 1, j = 1, 2,.... (2.13) The coefficients η j and ζ j are chosen such that < v j, w j >= η j ζj < r j 1, ˆr j 1 >= 1. (2.14) A choice that corresponds to the nonsymmetric Lanczos given above is to take v j = 1 which gives η j = 1 r j 1, and ζj = r j 1 < r j 1, ˆr j 1 >. (2.15) The elements of the tridiagonal matrix T of the nonsymmetric Lanczos algorithm are related to the parameters α j, β j, η j, and ζ j. In order to use BICG to compute eigenvalues as well as solve linear system of equations we ll need to find the T matrix elements. From BICG we have { p j 1, if j = 1 r j 1 = (2.16) p j 1 β j 2 p j 2, if j > 1. Similarly, ˆr j 1 = { ˆp j 1, if j = 1 ˆp j 1 β j 2ˆp j 2, if j > 1, (2.17) where β j 1 = < r j, ˆr j > < r j 1, ˆr j 1 > α j 1 = < r j 1, ˆr j 1 >, < Ap j 1, ˆp j 1 > for j = 1, 2,.... (2.18) The T matrix elements can be computed as follows: T jj =< Av j, w j >= η j ζj < Ar j 1, ˆr j 1 >= < Ar j 1, ˆr j 1 > < r j 1, ˆr j 1 >. (2.19) 7

Using Eqs.2.16, 2.17 and 2.12 we get, and T 11 = < Ap 0, ˆp 0 > < r 0, ˆr 0 > = 1 α 0, (2.20) T jj = < A(p j 1 β j 2 p j 2 ), ˆp j 1 β j 2ˆp j 2 > < r j 1, ˆr j 1 > = 1 + βj 2 α j 1 α j 2 < Ap j 1, ˆp j 2 > + < Ap j 2, ˆp j 1 > β j 2, if j > 1. (2.21) < r j 1, ˆr j 1 > Using the biorthogonality relations Eq.2.12 this simplifies to, The off-diagonal element T j,j+1 is given by, T jj = 1 α j 1 + βj 2 α j 2, if j > 1. (2.22) T j,j+1 = < Av j+1, w j >= η j+1 ζj < Arj, ˆr j 1 > { η j+1 ζj < A(p j β j 1 p j 1 ), ˆp j 1 >, if j = 1 = η j+1 ζj < A(p j β j 1 p j 1 ), ˆp j 1 β j 2ˆp j 2 >, if j > 1. (2.23) Using the biorthogonality relations, we get Similarly, T j,j+1 = η j+1 ζj < r j, ˆr j > α j 1, j = 1, 2,... = r j 1 r j β j 1 α j 1, using Eq.2.15. (2.24) T j+1,j = < Av j, w j+1 >= η j ζj+1 < Ar j 1, ˆr j > { η j ζj+1 < Ap j 1, ˆp j = β j 1ˆp j 1 >, if j = 1 η j ζj+1 < A(p j 1 β j 2 p j 2 ), ˆp j β j 1ˆp j 1 >, if j > 1. (2.25) Using the biorthogonality relations again, we get T j+1,j = η j ζj+1 < r j, ˆr j > α j 1, j = 1, 2,... = r j 1, using Eq.2.15. (2.26) r j 1 α j 1 2.3. The QMR algorithm form BICG. As discussed in Sec.2, the QMR and BICG algorithms are based on the two-sided Lanczos algorithm and they differ in the way the solutions to the linear systems are updated. It turned out [26],[27] that the solutions of the m dimensional linear system in Eq.2.8 and the m + 1 leastsquares problem in Eq.2.9 are related to each other. This allows for a version of the QMR algorithm in which the updates could be constructed from the BICG updates. 8

Consequently, it is possible to write a single algorithm that combines BICG and QMR. The main result that connects the two algorithms is that the solutions ym q and y q m 1 of the least-squares problem at the (m 1) th and m th steps are related to the solution ym b of the linear system at the m th step by the relation y q m = (1 ψ m ) where the coefficient ψ m is given by [ y q m 1 0 ] + ψ m y b m, m = 1, 2,..., (2.27) 1 ψ m = 1 + θ m θ m = 1 ξe (m+1) 1 τ T m ym b 2. (2.28) m 1 Moreover, the coefficients τ m s can be updated from step to step by τ m = τ m 1 θ m ψ m, where τ 0 := r 0 2. (2.29) If the bi-lanczos vectors v j are chosen such that v j = 1, then it can be shown that and consequently we have, ξe (m+1) 1 T m ym b = rbicg m, (2.30) θ m = rbicg m 2. (2.31) τ m 1 An implementation of QMR based on the BICG algorithm is given as[27] in Algorithm 3. Except for the steps starting with (QMR), the algorithm is identical to the BICG Algorithm 3 The BICG/QMR algorithm Choose initial guess x 0 and set x qmr 0 = x bicg 0 = x 0. Compute r bicg 0 = r 0 = b Ax 0, and set p 0 = r bicg 0. Choose ˆr 0 such that < r 0, ˆr 0 > 0, and set ˆp 0 = ˆr 0. Set ρ 0 =< r 0, ˆr 0 >. (QMR) Set θ 0 = 0, τ 0 = r 0 2, q 0 = 0. for j = 1, 2,... till convergence do Compute Ap j 1 and A H ˆp j 1. Compute σ j 1 =< Ap j 1, ˆp j 1 > and α j 1 = ρj 1 σ j 1. If BICG iterates are desired, set x bicg j = x bicg j 1 + α j 1p k 1. Compute r bicg j = r bicg j 1 α j 1Ap j 1 and ˆr j = ˆr k 1 ᾱ j 1 A H ˆp j 1. (QMR) Compute θ j = rbicg j 2 τ j 1, ψ j = 1 1+θ j, τ j = τ j 1 θ j ψ j, q j = ψ j θ j 1 q j 1 + ψ j α j 1 p j 1, x qmr j = x qmr j 1 + q j. Compute ρ j =< r bicg j, ˆr j > and β j 1 = ρj ρ j 1. Set p j = r bicg j + β j 1 p j 1 and ˆp j = ˆr j + β j 1ˆp j 1. end for algorithm. The advantage is that we can write a single algorithm which can do either 9

QMR or BICG. Note also that, for eigenvalue computation, there is no difference between QMR and BICG as they both build the same Krylov subspaces. From this point, we ll use this BICG/QMR algorithm and the choice will be left for the user to solve the linear system with QMR or BICG. 2.4. J-Symmetric BICG/QMR algorithm. Both BICG and QMR algorithms requires a matrix-vector multiplication with A and A H for a general non- Hermitian matrices. In some cases the matrix A satisfies a Hermiticity condition with respect to a Hermitian non-singular matrix J as, JA = A H J, and J = J H. (2.32) In this case, A will be called J-symmetric. When the matrix J is known and when its multiplication with a vector is much cheaper than the multiplication of A H with a vector, we can use this relation to simplify the BICG and QMR algorithms. The simplification is obtained by choosing ˆr 0 (and consequently ˆp 0 ) as ˆr 0 = Jr 0 = ˆp 0 = Jp 0. (2.33) This choice together with the J-Hermiticity condition leads to the following ρ j 1, σ j 1, α j 1, β j 1 are real for j = 1, 2,.... ˆr j 1 = Jr j 1 and ˆp j 1 = Jp j 1 for j = 1, 2,.... These can be proven by induction or just checking the first few iterations of the algorithm to see that they carry on from one iteration to the next. A J-symmetric version of the BICG and QMR algorithms is given in Algorithm 4 [16]. In addition Algorithm 4 J-symmetric QMR/BICG algorithm Choose initial guess x 0 and set x qmr 0 = x bicg 0 = x 0. Compute r bicg 0 = r 0 = b Ax 0, set p 0 = r bicg 0. ρ 0 =< r 0, Jr 0 >. (QMR) Set θ 0 = 0, τ 0 = r 0 2, q 0 = 0. for j = 1, 2,... till convergence do Compute Ap j 1. Compute σ j 1 =< Ap j 1, Jp j 1 > and α j 1 = ρj 1 σ j 1. If BICG iterates are desired, set x bicg j = x bicg j 1 + α j 1p k 1. Compute r bicg j = r bicg j 1 α j 1Ap j 1. (QMR) Compute θ j = rbicg j 2 τ j 1, ψ j = 1 1+θ j, τ j = τ j 1 θ j ψ j, q j = ψ j θ j 1 q j 1 + ψ j α j 1 p j 1, x qmr j = x qmr j 1 + q j. Set ρ j =< r bicg j, Jr bicg j Set p j = r bicg j + β j 1 p j 1. end for > and compute β j 1 = ρj ρ j 1. to avoiding the matrix-vector product with A H, the J-Symmetric algorithm requires less storage because we don t need to store the vectors ˆr j and ˆp j. Another important property of J-symmetric matrices has to do with the eigenvalue spectrum and the relation between left and right eigenvectors. A general non-defective non-hermitian matrix has left and right eigenvectors defined by AZ R = Z R Λ, A H Z L = Z L Λ, (2.34) 10

where Λ = diag(λ 1, λ 2,... ) is the matrix of eigenvalues. In addition, the left and right eigenvectors are bi-orthogonal and we can choose a normalization such that, In case of a J-symmetric matrix, we have Z H L Z R = I. (2.35) A H (JZ R ) = J(AZ R ) = JZ R Λ. (2.36) Eq.2.36 shows two things: first is that JZ R are left eigenvectors, and second, since for left eigenvectors we should have Λ on the right-hand side, it should be true that the eigenvalues of a J-symmetric matrix are either real or come in complex conjugate pairs. In other words, if (λ, λ) is a conjugate pair of eigenvalues with corresponding right eigenvectors (z λ r, z λ r ) then (Jz λ r, Jz λ r ) are the corresponding left eigenvectors with eigenvalues ( λ, λ). So, for J-symmetric matrices, if we have a set of right eigenvectors that has either real or conjugate pairs of eigenvalues, then the left eigenvectors are obtained through multiplication with J and re-ordering of the right vectors. 3. The eigbicg/qmr Algorithm. Given the discussion before about the eigcg algorithm and the discussion of the BICG/QMR algorithm for non-symmetric linear system and eigenvalue problem, we present in this section the eigbicg algorithm. The construction of the new algorithm will follow closely the construction of the eigcg algorithm taking into account the necessary modification for the nonsymmetric case. Our goal is to add functionality to the BICG/QMR algorithm that will use a limited size window of vectors V and W that will be restarted in a certain way in order to compute an approximation to few left and right eigenvectors with small eigenvalues. The eigenvalue computation part is built on top of the linear solver without affecting it. Let nev be the number of eigenvectors that we need to compute and let m be the maximum size of the search subspace that we can use to find an approximation of the eigenvectors such that m > 2nev. The method used to find the approximate eigenvectors is as follows: Let V m = [v 1, v 2,..., v m ] and W m = [w 1, w 2,...,w m ] are left and right search spaces with dimension m such that W m and V m are bi-orthogonal. Let Z (m) and S (m) be the approximate left and right eigenvectors computed with the Raleigh-Ritz procedure and using V m and W m as search subspaces. Let Z (m 1) and S (m 1) be the approximations of the eigenvectors we get by using only V m 1 and W m 1 as search subspaces. Then, if we are interested in finding nev < m eigenvectors, we can get a better approximation by applying the Raleigh-Ritz procedure on the dimension 2nev bi-orthogonal basis of the subspaces, [ ] Z (m) 1, Z (m) 2,...,Z nev (m) ; Z (m 1) 1, Z (m 1) 2,...,Z nev (m 1) [ ] S (m) 1, S (m) 2,...,S nev (m) ; S (m 1) 1, S (m 1) 2,..., S nev (m 1). (3.1) If T (m) = W H m AV m and T (m 1) = W H (m 1) AV (m 1) are the projection matrices with m and m 1 size subspaces. Let Y (m), G (m) be the left and right eigenvectors of T (m) and similarly, Y (m 1), G (m 1) are the eigenvectors of T (m 1). Let Ỹ (m 1), G (m 1) be the vectors obtained by appending a zero at the end of Y (m 1) and G (m 1). Then we have Z (m) = V m Y (m), S (m) = W m G (m) Z (m 1) = V m Ỹ (m 1), S (m 1) = W m G(m 1). (3.2) 11

Using Equation 3.2, the bi-orthogonalization step need only to be performed at the level of the small eigenvectors of the projection matrix T. This gives two important simplifications. First, finding a bi-orthogonal basis of the sets 3.1 is equivalent to finding a bi-orthogonal basis of the sets [ ] Y (m) 2,..., Y nev (m) ; Ỹ (m 1) 1, Ỹ (m 1) 2,..., Ỹ nev (m 1) 1, Y (m) [ G (m) 1, G (m) 2,..., G (m) (m 1) (m 1) nev; G 1, G 2,..., G (m 1) nev ]. (3.3) This is very cheap to do as these are vectors of size m. Second, unless at the end, there is no need to build the approximate eigenvectors of A. In order to improve the approximation of the eigenvectors the procedure described above is restarted as follows: after the first m iterations of BICG, we have V m and W m obtained only from the first BICG residuals and the projection matrix T (m) has a tridiagonal form. One then computes 2nev approximate left and right eigenvectors Ql and Qr and their corresponding 2nev eigenvalues as described above. Now, we restart our search subspaces with the eigenvectors as V 1:2nev = Ql and W 1:2nev = Qr and append new generated residual vectors from BICG to this set until it has m vectors again and recompute new approximations of Ql and Qr. Note that the restarted projection matrix has now a diagonal part of size 2nev 2nev and the remaining part is tridiagonal except for the (2nev + 1) th row an column. The tridiagonal part will be filled in from the BICG iterations as it happened before we restarted. The (2nev + 1) th row and column elements comes from the first new residual vectors we add after we restart. If we call these two vectors ˆr new and r new, then we need to compute ˆr new H AV 1:2nev and W1:2nevAr H new. The matrix-vector products Ar new and A Hˆr new will be avoided by storing the vectors Ap prev and A H ˆp prev from the previous iteration of BICG. After restarting the projection matrix will be given by: T = W H AV = λ 1 λ 2... λ 2nev. (3.4) This process will be repeated as long as the linear system have not converged, however, it won t interfere with BICG iterations. At the end, when the linear system has converged to the desired tolerance, we compute a final approximation of the Ritz vectors using the latest subspaces V s and W s where 2nev s m. The eigenvalue part of eigbicg requires the storage of 2m vectors V m and W m as well as two more vectors Ap prev and A H ˆp prev needed to avoid matrix vector products when restarting. This is much lower than storing all the Lanczos vectors required if we use un-restarted Lanczos to compute eigenvectors. So, it is much more memory efficient than the unrestarted Lanczos. Computationally, it also have minimal cost on top of BICG where no matrix-vector products are needed and all needed computations are performed on matrices and vectors of dimension m. The method however still subject to the same problems associated with the Lanczos algorithm, namely, possible break down and loss of biorthogonality. We also note that, the eigenvector computation part will be the 12

same for BICG and QMR. The eigbicg/qmr(nev,m) algorithm that solves a nonsymmetric linear system and computes approximate nev left and right eigenvectors and eigenvalues using subspaces of maximum dimension m is given in Algorithms 5,6. Algorithm 5 eigbicg/qmr(nev,m) Choose initial guess x 0 and set x qmr 0 = x bicg 0 = x 0. Compute r bicg 0 = r 0 = b Ax 0, and set p 0 = r bicg 0. Choose ˆr 0 such that < r 0, ˆr 0 > 0, and set ˆp 0 = ˆr 0. Set ρ 0 =< r 0, ˆr 0 >. (QMR) Set θ 0 = 0, τ 0 = r 0 2, q 0 = 0. vs = 0. for j = 1, 2,... till convergence do Compute Ap j 1 and A H ˆp j 1. Compute σ j 1 =< Ap j 1, ˆp j 1 > and α j 1 = ρj 1 If BICG iterates are desired, set x bicg j if vs = m 1 then Ap prev = Ap j 1, Atp prev = A H ˆp j 1. end if if vs = m then Compute eigenvalues using Algorithm 6. end if σ j 1. = x bicg j 1 + α j 1p k 1. vs = vs + 1, v vs = 1 r r j 1 j 1, w vs = rj ρ j 1 ˆr j 1. Compute the diagonal T matrix elements: T vs,vs = { 1 α j 1, if j = 1. 1 α j 1 + βj 2 α j 2, if j > 1. if vs < m then Compute the off-diagonal T matrix elements: T vs,vs+1 = r j 1 r j T vs+1,vs = r j r j 1 β j 1, α j 1 1. α j 1 end if Compute r bicg j = r bicg j 1 α j 1Ap j 1 and ˆr j = ˆr k 1 ᾱ j 1 A H ˆp j 1. (QMR) Compute θ j = rbicg j 2 τ j 1, ψ j = 1 1+θ j, τ j = τ j 1 θ j ψ j, q j = ψ j θ j 1 q j 1 + ψ j α j 1 p j 1, x qmr j = x qmr j 1 + q j. Set ρ j =< r bicg j, ˆr j > and compute β j 1 = ρj ρ j 1. Set p j = r bicg j + β j 1 p j 1 and ˆp j = ˆr j + β j 1ˆp j 1. end for 13

Algorithm 6 Eigenvalue part of eigbicg/qmr Compute the lowest nev left and right eigenvectors G (m), Y (m) and eigenvalues Λ (m) of T m. Compute the lowest nev left and right eigenvectors G (m 1), Y (m 1) and eigenvalues Λ (m 1) of T m 1. Extend the vectors G (m 1), Y (m 1) by appending zeros at the last row to get vectors G (m 1), Ỹ (m 1). define the set Y r = [Y1:nev, m Ỹ 1:nev m 1 ] and Y l = [G m m 1 1:nev, G 1:nev ], where each has dimension m 2nev. Bi-orthogonalize Y r and Y l to get Ỹr, and Ỹl. Define H = Ỹl H T m Ỹ r of dimension 2nev 2nev. Compute left eigenvectors Q l, right eigenvectors Q r, and eigenvalues E h of H. Set the 2nev Lanczos vectors to be the 2nev Ritz vectors as V (1:2nev) = V (1:m) Ỹ r Q r, W (1:2nev) = W (1:m) Ỹ l Q l Set T m = 0 and T (1:2nev,1:2nev) = diag(e h ). Set vs = 2nev. T (2nev+1,1:2nev) = rj 1 ρ j 1 (A H ˆp j 1 β j 2 Atp prev ) H V (1:2nev). T (1:2nev,2nev+1) = 1 r j 1 W H (1:2nev) (Ap j 1 β j 2 Ap prev ). 14

4. The J-Symmetric eigbicg-qmr Algorithm. As has been discussed before, the BICG/QMR algorithm simplifies for the case of J-symmetric matrices. Similarly, the eigbicg-qmr algorithm can be simplified for J-symmetric matrices. First, the left Lanczos vectors w j are related to the right vectors v j. Since ˆr j = Jr j then w j = Jv j d j, where d j can be determined from the biorthogonality condition < v j, w j >= 1 which gives d j = 1/ < v j, Jv j > and d j is real since J is Hermitian. Second, using the fact that eigenvalues will be either real or come in a conjugate pair and the biorthognality of left and right eigenvectors we also have a relation between left eigenvectors wj λ and right eigenvectors vλ j. However, in this case one has to be careful as wj λ = Jv λ j c j where c j = 1/ < v j, Jv λ j >. For real eigenvalues c j is real and for complex eigenvalues it is complex. In general, for a set of vectors V which contains Lanczos vectors as well as right eigenvectors (if the right eigenvector for the eigenvalue λ is included in V then the right eigenvector for λ has to be included also) we have W = JV D 1, where D = V H JV = D H. (4.1) For example, let V = {v 1, v 2, v σ, v λ, v λ} where v 1, v 2 are Lanczos vectors, v σ is a right eigenvector of real eigenvalue σ and v λ, v λ are right eigenvectors corresponding to the conjugate pair of eigenvalues λ, λ, then, a 1 0 0 0 0 0 a 2 0 0 0 V H JV = 0 0 a σ 0 0 0 0 0 0 a λ, 0 0 0 a λ 0 1 a 1 0 0 0 0 1 0 D 1 a 2 0 0 0 = 1 0 0 a σ 0 0 1 0 0 0 0, (4.2) a λ 1 0 0 0 where a λ 0 a 1 = < v 1, Jv 1 >, a 2 =< v 2, Jv 2 >, a σ = < v σ, Jv σ >, a λ =< v λ, Jv λ >. (4.3) The projection matrix T = W H AV = D 1 V H JAV and is symmetric with respect to D such that DT = T H D. Since T m is symmetric with respect to D, the left vectors corresponding to the basis Ỹr given in Algorithm 6 can be determined as Ỹ l = DỸrR 1, where the matrix R is determined from the condition Ỹ H l gives R = Ỹ H r DỸr = R H. The matrix H is given by Ỹ r = I. This H = Ỹ l H T m Ỹ r = R 1 Ỹr H DT mỹr, (4.4) and is symmetric with respect to R, i.e. RH = H H R. When computing the eigenvalues of H, it will be useful to enforce this symmetry and use H s = (H +R 1 H H R)/2. This will guarantee that the eigenvalues of H s are either purely real or exact complex conjugate pairs in a finite precision calculation. For j m, V has only Lanczos vectors and the matrix D has diagonal elements D j,j =< v j, Jv j >. After computing nev 1 + nev 2 right eigenvectors of A we are going to restart and now the 15

(nev 1 + nev 2 ) (nev 1 + nev 2 ) block of D will have either diagonal elements if the eigenvalue is real or block diagonal for each pair of conjugate eigenvalues as discussed before. So, after restart with only vs = nev 1 + nev 2 right eigenvectors V (1:vs) and T (1:vs,1:vs) = diag(e h ), we reset the matrix D to have dimension vs vs where D = V H (1:vs) JV (1:vs)) and V (1:vs) are the right Ritz vectors. The left Ritz vectors will be given by W (1:vs) = JV (1:vs) D 1 (1:vs,1:vs). In order to compute T (vs+1,1:vs) and T (1:vs,vs+1) we need Av vs+1 and A H w vs+1 where v vs+1 and w vs+1 are the new Lanczos vectors given by v vs+1 = 1 r j 1 r j 1, w vs+1 = Jv vs+1 / < v vs+1, Jv vs+1 >= r j 1 ρ j 1 Jr j 1. (4.5) These matrix-vector products will be computed as Av vs+1 = 1 r j 1 Ar j 1 = 1 r j 1 (Ap j 1 β j 2 Ap j 2 ), A H w vs+1 = r j 1 ρ j 1 A H Jr j 1 = r j 1 ρ j 1 JAr j 1 = r j 1 ρ j 1 J(Ap j 1 β j 2 Ap j 2 ). (4.6) With these relations we can compute the new T matrix elements using the J-symmetry and we have T (vs+1,1:vs) = w H vs+1 AV (1:vs) = (A H w vs+1 ) H V (1:vs) = r j 1 ρ j 1 (Ap j 1 β j 2 Ap j 2 ) H JV (1:vs), T (1:vs,vs+1) = W(1:vs) H Av vs+1 = D 1 (1:vs,1:vs) V (1:vs) H JAv vs+1 1 = r j 1 D 1 (1:vs,1:vs) V (1:vs) H J(Ap j 1 β j 2 Ap j 2 ). (4.7) Taking into account these simplifications, the J-symmetric eigbicg-qmr algorithm is given by Algorithms 7 and 8. 16

Algorithm 7 eigbicg/qmr-j(nev,m) Choose initial guess x 0 and set x qmr 0 = x bicg 0 = x 0. Compute r bicg 0 = r 0 = b Ax 0, and set p 0 = r bicg 0. ρ 0 =< r 0, Jr 0 >. (QMR) Set θ 0 = 0, τ 0 = r 0 2, q 0 = 0. Choose the number of eigenvalues to be computed nev, and the maximum subspace dimension m > 2nev + 2. Set vs = 0. for j = 1, 2,... till convergence do Compute Ap j 1. Compute σ j 1 =< Ap j 1, Jp j 1 > and α j 1 = ρj 1 If BiCG iterates are desired, set x bicg j if vs = m then Compute eigenvalues using Algorithm 8 end if σ j 1. = x bicg j 1 + α j 1p k 1. vs = vs + 1, v vs = 1 r r j 1 j 1, D(vs, vs) =< v vs, Jv vs >. Compute the diagonal T matrix elements: T vs,vs = { 1 α j 1, if j = 1. 1 α j 1 + βj 2 α j 2, if j > 1. if vs < m then Compute the off-diagonal T matrix elements: T vs,vs+1 = r j 1 r j β j 1 α j 1, T vs+1,vs = r j 1. r j 1 α j 1 end if Compute r bicg j = r bicg j 1 α j 1Ap j 1. (QMR) Compute θ j = rbicg j 2 τ j 1, ψ j = 1 ψ j α j 1 p j 1, x qmr j Set ρ j =< r bicg j = x qmr j 1 + q j., Jr bicg j Set p j = r bicg j + β j 1 p j 1. end for > and compute β j 1 = ρj ρ j 1. 1+θ j, τ j = τ j 1 θ j ψ j, q j = ψ j θ j 1 q j 1 + 17

Algorithm 8 eigbicg/qmr-j : part2 Compute the lowest nev1 left and right eigenvectors G (m), Y (m) and eigenvalues Λ (m) of T m. Choose nev 1 = nev or nev 1 = nev + 1 such that we have a subset of eigenvalues that are real or contains pairs of complex conjugate values. Compute the lowest nev left and right eigenvectors G (m 1), Y (m 1) and eigenvalues Λ (m 1) of T m 1. Similarly, choose nev 2 = nev or nev 2 = nev + 1. Extend the vectors G (m 1), Y (m 1) by appending zeros at the last row to get vectors G (m 1), Ỹ (m 1). define the set Y r = [Y1:nev1, m Ỹ 1:nev2 m 1 ] and Y l = [G m m 1 1:nev1, G 1:nev2 ], where each has dimension m (nev1 + nev2). Bi-orthogonalize Y r and Y l to get Ỹr, and Ỹl. Compute R11 = Ỹr(1 : nev1)dỹr(1 : nev1) and R22 = Ỹr(1 : nev2)dỹr(1 : nev2). Set ( ) R11 0 R = 0 R22 and compute R 1 = ( R11 1 0 ) 0 R22 1 Ỹ l DỸrR 1. Compute H = R 1 Ỹr H DT m Ỹ r. Symmetrize H with respect to R as H = (H +R 1 H H R)/2 in order to ensure that the eigenvalues of H are either real or pairs of complex conjugate values. Compute right eigenvectors Q r, and eigenvalues E h of H. Set the nev 1 + nev 2 Lanczos vectors to be the nev 1 + nev 2 Ritz vectors as V (1:nev1+nev 2) = V (1:m) Ỹ r Q r. Set T m = 0 and T (1:nev1+nev 2,1:nev 1+nev 2) = diag(e h ). Set vs = nev 1 + nev 2. Set D = 0 then set D (1:vs,1:vs) = V(1:vs) H JV (1:vs). T (vs+1,1:vs) = rj 1 ρ j 1 (Ap j 1 β j 2 Ap j 2 ) H JV (1:vs). T (1:vs,vs+1) = 1 r j 1 D 1 (1:vs,1:vs) V (1:vs) H J(Ap j 1 β j 2 Ap j 2 ). 18

5. Systems with multiple right-hand sides. A main motivation for developing the eigbicg algorithm is to use the computed small eigenvalues to speed up the solution for systems with multiple right-hand sides. The advantage of eigbicg is that we compute the eigenvalues simultaneously while solving the linear system, so we don t waste the time calling a separate eigensolver. EigBICG, however, can compute only few eigenvalues and the accuracy of the computed eigenvalues may not be good enough for effective deflation of subsequent systems. Because of these limitations, we use a method similar to that which was used in the symmetric case with eigcg. Consider the linear systems Ax k = b k ; k = 1, 2,..., n r, (5.1) where A is a non-hermitian matrix. We divide the solution process into two phases. In the first phase, we solve a subset n 1 of the systems using eigbicg. Left and right deflation subspaces Q l and Q r are build incrementally built from new left and right eigenvectors U l and U r computed with eigbicg. The new eigenvectors are appended to and bi-orthogonalized to the current deflation subspaces. These are then used to deflate the next right-hand side. At the end of the first phase, we have accumulated bi-orthogonal deflation subspaces of dimension n 1 nev. The final vectors Q l and Q r are then used in the second phase to deflate BICGSTAB. They can also be used to compute approximate n 1 nev left and right eigenvectors. In both phases, deflation will be done using left-right projection, however, it is also possible to use Galerkin or Minimal Residual projection as will be discussed later. The algorithm for systems with multiple right-hand sides for a general non-symmetric and non-defective matrix A is called Incremental eigbicg and is given in Algorithm 9. Algorithm 9 Incremental eigbicg-qmr Set the initial guesses x k 0 = 0 for k = 1, 2,..., n r. Initialize the left and right vectors U l = [ ], U r = [ ], and H = [ ]. Choose the maximum subspace dimension m and the number of eigenvectors to be computed when solving a right-hand side nev such that m > 2nev. for k = 1, 2,...,n 1 do if U r is not empty then set x k 0 = U rh 1 Ul H b k. end if solve the system Ax k = b k with x k 0 as initial guess using eigbicg/qmr and get nev approximate right eigenvectors V and left eigenvectors W. Compute [V, W ] = biorthogonalize [V, W] against [U r, U l ]. Compute the new H: ( ) H U H = l H AV W H AU r W H AV Add the newly obtained vectors U l = [U l W ] and U r = [U r V ]. end for for k = n 1 + 1, n 1 + 2,...,n r do x k 0 = U r H 1 Ul H b k. Solve the system Ax k = b k with x k 0 as initial guess using BiCGSTAB. end for 19

5.1. J-symmetric Incremental eigbicg/qmr. For J-symmetric matrices we have given a version of the eigbicg/qmr algorithm that replaces the expensive matrix vector multiplication of A H with the cheaper multiplication with J. The J-symmetric eigbicg/qmr also saves memory by not requiring to store the left eigenvectors as they could be obtained from the right eigenvectors using the relation: Z L = JZ R C 1 ; C = Z H R JZ R. (5.2) When using Equation 5.2, we have to ensure that if an eigenvalue is included then its complex conjugate pair is also included. Equation 5.2 also takes account of the appropriate normalization as Z H L Z R = I, i.e. it gives bi-orthogonal sets. There are options to avoid the storage of the left vectors but these are still under development and will be discussed in the future. For this report we ll implement the incremental part in a similar fashion to the case when A is not J-symmetric. The J-symmetric Incremental eigbicg/qmr algorithm is given in Algorithm 10. Algorithm 10 J-symmetric Incremental eigbicg/qmr Set the initial guesses x k 0 = 0 for k = 1, 2,..., n r. U r = [ ], H = [ ], and U l = [ ]. Choose the maximum subspace dimension m and the number of eigenvectors to be computed when solving a right-hand side nev such that m > 2nev + 2. for k = 1, 2,...,n 1 do if U r is not empty then Set x k 0 = U rh 1 Ul H b k. end if solve the system Ax k = b k with x k 0 as initial guess using J-symmetric eig- BICG/QMR and get nev or nev + 1 approximate right eigenvectors V and left eigenvectors W = JV C 1 where C = V H JV. biorthogonalize V and W against U r and U l and get the new vectors Ṽ and W. Compute the new H: ( ) H U H H = l AṼ W H AU r W H AṼ Add the newly obtained vectors U l [U l W] and Ur [U r Ṽ ]. end for for k = n 1 + 1, n 1 + 2,...,n r do x k 0 = U r H 1 Ul H b k. solve the system Ax k = b k with x k 0 as initial guess using BICGSTAB. end for 20

6. Test matrices. In order to study the algorithms presented here we use test matrices from different applications. The test matrices used are described here. We choose matrices for which BICG and BICGSTAB algorithms converged and also some of these matrices were used in the study of other algorithms for non-symmetric linear systems with multiple right-hand sides. Here, we give the basic properties of these matrices and how they were obtained. 6.1. Matrix Market Matrices. From the Matrix Market [28] and the University of Florida Sparse Matrix Collection [29] we choose the matrices listed in Table 6.1. Table 6.1 Test matrices from the Matrix Market and the University of Florida Sparse Matrix Collection. matrix name size type source Is A + A definite? sherman4 1,104 real nonsymmetric [28] no orsreg 1 2,205 real nonsymmetric [28] no pde2961 2,961 real nonsymmetric [28] yes raef sky1 3,242 real nonsymmetric [29] yes light in tissue 29,282 complex nonsymmetric [29] yes 6.2. Partial Differential Operator. Two matrices that correspond to the fivepoint discretization of the operator L(u) = u xx u yy + β(u x + u y ), (6.1) on the unit square with homogeneous Dirichlet conditions on the boundary. First order derivatives are descretized by central differences. Two matrices SGb1 and SGb100 are obtained corresponding to parameters β = 1 and β = 100 respectively. The discretization was performed using a grid size of h = 1 51, yielding a matrix of size n = 2, 500. These matrices are real nonsymmetric with positive definite symmetric part. These matrices were generated with the SPARSKIT [30] and they were used as test matrices in [31]. 6.3. Convection-Diffusion. We consider the five-point discretization of the operator on the unit square with the boundary conditions u xx + u yy + cu x, (6.2) u(x, 0) = u(0, y) = 0 u(x, 1) = u(1, y) = 1. (6.3) The discretization was performed using a grid size of h = 1 41, yielding a matrix of size 1, 600. We use c = 40 which gives a a real nonsymmetric system with definite symmetric part. The matrix was generated using the SPARSKIT code [30] and it will be called CDiff 42 in the following discussion. 6.4. Lattice QCD examples. We consider two Lattice QCD examples for the case of Wilson fermions near κ critical. These two examples were used in [13] and they were generated using the Chroma software package [32]. They correspond to 21

lattice sizes 8 4 and 12 4. The size of the matrix will be the lattice size multiplied by 12 because of 4 spin and 3 color degrees of freedom. In addition, due to the nearest neighbor coupling, the matrix A has the property that with a red-black (or odd-even) ordering of the grid points the matrix becomes A = I κd with D = ( 0 Doe D eo 0 ), (6.4) where κ is related to the bare quark mass such that massless quarks corresponds to a κ value κ c called kappa critical. This means that we can solve either the original system or the even-odd preconditioned system with half the size. The Lattice QCD matrices that will be used are listed in Table 6.2. Table 6.2 Lattice QCD test matrices. L and T are the number of lattice points along space and time directions respectively. matrix name L T Temperature kappa size type QCD49K 8 8 1/5.5 0.18182 49,152 complex nonsymmetric QCD49K-eo 8 8 1/5.5 0.18182 24,576 complex nonsymmetric QCD249K 12 12 1/5.8 0.163934 248,832 complex nonsymmetric QCD249K-eo 12 12 1/5.8 0.163934 124,416 complex nonsymmetric 7. EigBICG Results. EigBICG differs from BICG in terms of how it computes the eigenvalues using only a subset of the Krylov subspace vectors. So, we first discuss the quality of the computed eigenvalues with eigbicg. The accuracy of the eigenvalues obtained with eigbicg algorithm depends on three parameters: nev: the number of eigenvalues to be computed m: the dimension of the subspace that will be stored. tol: the tolerance of convergence for the linear system. We discuss the effect of these parameters on the accuracy of the computed eigenvetcors for different test matrices. 7.1. Dependence on nev. Tables 7.1-7.7 show how the residual of the lowest five eigenvalues depends on nev. We choose a fixed value for tol and also a fixed size m 2nev for the extra vectors used in computing eigenvectors. From these results, it seems that there is an optimal choice of nev that depends on the size of the problem. In the case of sherman4, a value nev = 5 seems to give the best results and increasing nev leads to less accurate eigenvalues. For SGb1, still nev = 5 seems to give results that are as good as those obtained with larger nev values. As the problem size increases as in the case of light in tissue, we see that nev = 10 gave best results. 22

Table 7.1 Lowest 5 eigenvalues and their residuals obtained with eigbicg for Sherman4 with tol = 1.e 08 for the linear system. The linear system converged in 134 iterations. nev, m Eigenvalues Residuals 5, 20 (3.073e-02, 0) 1.93e-09 (8.470e-02, 0) 9.62e-08 (2.776e-01, 0) 1.22e-05 (3.988e-01, 0) 2.49e-04 (4.315e-01, 0) 8.80e-04 10, 30 (3.073e-02, -2.730e-14) 2.15e-03 (8.470e-02, -1.720e-12) 1.85e-02 (2.780e-01, -3.083e-11) 9.95e-02 (4.020e-01, -1.027e-09) 2.46e-01 (4.313e-01, 2.770e-09) 5.46e-01 15 40 (3.073e-02, -8.309e-07) 1.29e-02 (8.480e-02, -8.222e-05) 1.29e-01 (2.963e-01, -1.171e-02) 1.67e+00 (4.133e-01, -7.033e-03) 1.68e+00 (6.125e-01, -2.410e-02) 6.56e+00 Table 7.2 Lowest 5 eigenvalues and their residuals obtained with eigbicg for SGb1 with tol = 1.e 07 for the linear system. The linear system converged in 158 iterations. nev, m Eigenvalues Residuals 5, 20 (7.779e-03, 0) 9.62e-10 (1.914e-02, 0) 3.29e-07 (3.051e-02, 0) 4.08e-05 (3.804e-02, 0) 8.52e-06 (4.940e-02, 0) 3.09e-04 10,30 (7.779e-03, 0) 9.31e-10 (1.914e-02, 0) 2.10e-07 (3.051e-02, 0) 4.07e-05 (3.804e-02, 0) 8.52e-06 (4.940e-02, 0) 3.10e-04 15,40 (7.779e-03, 0) 9.31e-10 (1.914e-02, 0) 2.10e-07 (3.051e-02, 0) 4.07e-05 (3.804e-02, 0) 8.52e-06 (4.940e-02, 0) 3.10e-04 20,50 (7.779e-03, 0) 9.31e-10 (1.914e-02, 0) 2.10e-07 (3.051e-02, 0) 4.07e-05 (3.804e-02, 0) 8.52e-06 (4.940e-02, 0) 3.10e-04 23

Table 7.3 Lowest 5 eigenvalues and their residuals obtained with eigbicg for light in tissue with tol = 1.e 08 for the linear system. The linear system converged in 431 iterations. nev, m Eigenvalues Residuals 5,20 (3.402e-04, -1.466e-05) 3.90e-10 (8.502e-04, -1.488e-05) 4.15e-07 (1.358e-03, -1.516e-05) 1.18e-05 (1.720e-03, -1.528e-05) 7.88e-07 (2.221e-03, -1.569e-05) 3.18e-04 10,30 (3.402e-04, -1.466e-05) 5.53e-11 (8.502e-04, -1.488e-05) 4.09e-07 (1.358e-03, -1.516e-05) 1.18e-05 (1.720e-03, -1.528e-05) 7.88e-07 (2.221e-03, -1.569e-05) 3.18e-04 15,40 (3.402e-04, -1.466e-05) 5.53e-11 (8.502e-04, -1.488e-05) 4.08e-07 (1.358e-03, -1.516e-05) 1.18e-05 (1.720e-03, -1.528e-05) 7.88e-07 (2.221e-03, -1.569e-05) 3.16e-04 20,50 (3.402e-04, -1.466e-05) 2.74e-04 (8.502e-04, -1.488e-05) 2.84e-01 (1.358e-03, -1.516e-05) 1.87e+00 (1.720e-03, -1.528e-05) 3.15e-01 (2.224e-03, -1.553e-05) 1.96e+00 Table 7.4 Lowest 5 eigenvalues and their residuals obtained with eigbicg for QCD49K-eo with tol = 1.e 08 for the linear system. The linear system converged in 295 iterations. nev, m Eigenvalues Residuals 5,20 (5.104e-02, -4.288e-06) 8.92e-03 (5.066e-02, -2.730e-02) 4.56e-01 (8.103e-02, -1.165e-02) 4.71e-01 (8.207e-02, -8.010e-04) 2.28e-01 (6.843e-02, 7.479e-02) 3.96e-02 10,30 (4.257e-03, 1.415e-02) 1.12e+00 (5.104e-02, -2.630e-08) 1.46e-03 (8.103e-02, -1.165e-02) 4.70e-01 (8.208e-02, -8.106e-04) 2.28e-01 (2.640e-02, -9.333e-02) 1.24e+00 15,40 (5.104e-02, -2.630e-08) 1.45e-03 (8.103e-02, -1.165e-02) 4.70e-01 (8.208e-02, -8.106e-04) 2.28e-01 (6.843e-02, -7.485e-02) 4.12e-02 (6.846e-02, 7.483e-02) 1.59e-02 20,50 (3.236e-02, -2.322e-03) 9.94e-01 (5.104e-02, -2.630e-08) 1.45e-03 (8.103e-02, -1.165e-02) 4.70e-01 (8.208e-02, -8.106e-04) 2.28e-01 (6.843e-02, -7.485e-02) 4.11e-02 24