Deflation for inversion with multiple right-hand sides in QCD

Similar documents
College of William & Mary Department of Computer Science

Key words. linear equations, deflation, eigenvalues, Lanczos, conjugate gradient, QCD, multiple right-hand sides, symmetric, Hermitian

State-of-the-art numerical solution of large Hermitian eigenvalue problems. Andreas Stathopoulos

arxiv: v1 [hep-lat] 19 Jul 2009

TWO-GRID DEFLATED KRYLOV METHODS FOR LINEAR EQUATIONS

Efficient implementation of the overlap operator on multi-gpus

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Key words. linear equations, eigenvalues, polynomial preconditioning, GMRES, GMRES-DR, deflation, QCD

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

Comparing iterative methods to compute the overlap Dirac operator at nonzero chemical potential

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

PERTURBED ARNOLDI FOR COMPUTING MULTIPLE EIGENVALUES

arxiv: v1 [hep-lat] 2 May 2012

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

arxiv: v2 [math.na] 1 Feb 2013

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

DELFT UNIVERSITY OF TECHNOLOGY

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

IDR(s) as a projection method

ETNA Kent State University

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Communication-avoiding Krylov subspace methods

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Progress Report for Year 2. College of William and Mary Algorithms and software

Algorithms that use the Arnoldi Basis

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

Iterative methods for symmetric eigenvalue problems

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator

Arnoldi Methods in SLEPc

FEM and sparse linear system solving

Preconditioned inverse iteration and shift-invert Arnoldi method

DELFT UNIVERSITY OF TECHNOLOGY

Research Article MINRES Seed Projection Methods for Solving Symmetric Linear Systems with Multiple Right-Hand Sides

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

Math 310 Final Project Recycled subspaces in the LSMR Krylov method

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Iterative methods for Linear System

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

of dimension n 1 n 2, one defines the matrix determinants

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

1 Extrapolation: A Hint of Things to Come

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

WHEN studying distributed simulations of power systems,

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Block Krylov Space Solvers: a Survey

The Removal of Critical Slowing Down. Lattice College of William and Mary

Domain decomposition on different levels of the Jacobi-Davidson method

Controlling inner iterations in the Jacobi Davidson method

Iterative Methods for Linear Systems of Equations

Contents. Preface... xi. Introduction...

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Augmented GMRES-type methods

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

LARGE SPARSE EIGENVALUE PROBLEMS

Is there life after the Lanczos method? What is LOBPCG?

PRECONDITIONED RECYCLING KRYLOV SUBSPACE METHODS FOR SELF-ADJOINT PROBLEMS

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

ETNA Kent State University

The Lanczos and conjugate gradient algorithms

ANY FINITE CONVERGENCE CURVE IS POSSIBLE IN THE INITIAL ITERATIONS OF RESTARTED FOM

MULTIGRID ARNOLDI FOR EIGENVALUES

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

6.4 Krylov Subspaces and Conjugate Gradients

HARMONIC RAYLEIGH RITZ EXTRACTION FOR THE MULTIPARAMETER EIGENVALUE PROBLEM

arxiv: v1 [hep-lat] 2 Nov 2015

CONTROLLING INNER ITERATIONS IN THE JACOBI DAVIDSON METHOD

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

On the Superlinear Convergence of MINRES. Valeria Simoncini and Daniel B. Szyld. Report January 2012

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

PROJECTED GMRES AND ITS VARIANTS

M.A. Botchev. September 5, 2014

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Iterative Methods for Sparse Linear Systems

CONTROLLING INNER ITERATIONS IN THE JACOBI DAVIDSON METHOD

Krylov subspace projection methods

PoS(LATTICE 2008)114. Investigation of the η -η c -mixing with improved stochastic estimators

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

arxiv: v1 [hep-lat] 23 Dec 2010

A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

College of William & Mary Department of Computer Science

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm

Deflation and augmentation techniques in Krylov subspace methods for the solution of linear systems

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations

Alternative correction equations in the Jacobi-Davidson method. Mathematical Institute. Menno Genseberger and Gerard L. G.

DELFT UNIVERSITY OF TECHNOLOGY

Alternative correction equations in the Jacobi-Davidson method

arxiv: v4 [math.na] 7 Jan 2015

On the influence of eigenvalues on Bi-CG residual norms

Transcription:

Deflation for inversion with multiple right-hand sides in QCD A Stathopoulos 1, A M Abdel-Rehim 1 and K Orginos 2 1 Department of Computer Science, College of William and Mary, Williamsburg, VA 23187 2 Department of Physics, College of William and Mary, Williamsburg, VA 23187 E-mail: andreas@cs.wm.edu,amrehim@cs.wm.edu,kostas@wm.edu Abstract. Most calculations in lattice Quantum Chromodynamics (QCD) involve the solution of a series of linear systems of equations with exceedingly large matrices and a large number of right hand sides. Iterative methods for these problems can be sped up significantly if we deflate approximations of appropriate invariant spaces from the initial guesses. Recently we have developed eigcg, a modification of the Conjugate Gradient (CG) method, which while solving a linear system can reuse a window of the CG vectors to compute eigenvectors almost as accurately as the Lanczos method. The number of approximate eigenvectors can increase as more systems are solved. In this paper we review some of the characteristics of eigcg and show how it helps remove the critical slowdown in QCD calculations. Moreover, we study scaling with lattice volume and an extension of the technique to nonsymmetric problems. 1. Introduction Lattice QCD is one of the most computationally demanding applications. The heart of the computation is the solution of the lattice-dirac equation in a Euclidean space-time lattice [1], which translates to a linear system of equations Mx = b, often for a large number of right hand sides [2]. Because of large matrix size, iterative methods are the only way to solve the problem. The Dirac operator M is γ 5 -Hermitian, or γ 5 M = M H γ 5, for some γ 5 matrix, but it is often preferable to solve the normal equations A = M H M problem. Also, M = m q I D, where m q is a parameter related to the quark mass and D is an operator. Beyond the very large dimension and number of right hand sides, M becomes increasingly ill-conditioned as m q m critical. In lattice QCD this is known as critical slowdown and is a limiting computational factor. The solution of linear systems with many right hand sides is a challenging problem. Iterative methods that solve each system one by one tend to regenerate search directions within previously explored subspaces, thus wasting iterations. Seed and block methods are often used to share information between systems [3, 4, 5]. For Hermitian matrices, the most useful information lies in invariant subspaces near zero. Deflating these eigenpairs significantly improves conditioning of iterative methods. Such ideas have been tried in QCD [6], but effective deflation methods have only appeared recently. Moreover, one does not want to precompute the required eigenpairs through an eigensolver but to reuse the information built by the linear solvers. For non-hermitian matrices, the GMRESDR method [7, 8] computes approximate eigenspace information during GMRES(m). In GMRESDR, when the GMRES basis reaches m vectors, it is restarted not only with the residual of the approximate solution of the linear system but also with the nev < m smallest magnitude harmonic Ritz pairs. Therefore, being equivalent to c 29 Ltd 1

a version of the Implicitly Restarted Arnoldi, GMRESDR solves the linear system and at the same time converges to the nev smallest magnitude eigenpairs. The GMRESDR approach transfers also to the Hermitian case. For Hermitian systems, however, GMRESDR is much more expensive per step than CG and, more importantly, because it restarts every m steps, it impairs both the optimal convergence of CG and the eigenvalue convergence of unrestarted Lanczos. Recently, a Recycled MINRES method was developed in [9] which reuses a window of m iteration vectors of the MINRES to approximate nev eigenvectors. When m vectors have been reached the window is restarted similarly to GMRESDR, but the MINRES itself is not restarted. Our independently developed eigcg method similarly keeps a window of m vectors without restarting CG but restarts the window in a locally optimal way, thus achieving nearly optimal convergence for the eigenvalues [1]. 2. EigCG and incremental EigCG It is well known that the residuals produced by the CG iteration are co-linear with the Lanczos iteration vectors, and that the Lanczos tridiagonal projection matrix can be obtained through the CG coefficients [11]. Therefore, if we kept all the CG vectors, V, we could solve the Lanczos projected eigenvalue problem V T AV and obtain approximate eigenpairs that satisfy a certain optimality. This however would imply unlimited storage for V or it would require a second CG pass to recompute V on the fly. Moreover, spurious eigenvalues would have to be dealt with. The idea in our eigcg method is to use a limited memory window V with CG residuals as an eigenvector search space to keep track of the lowest nev Ritz vectors of the matrix A. When V grows to m > 2nev vectors, we restart it as follows: We combine u (m) i, i = 1, nev, lowest Ritz vectors at the m-th step (thick restarting) with their nev locally optimal CG recurrence vectors (u (m 1) i from m 1 step) and restart the basis V with an orthonormal basis for this set [12]: [ u (m) 1, u (m) 2,..., u (m) nev, u (m 1) 1,..., u (m 1) nev ]. (1) In contrast to a restarted eigensolver, we do not use V to determine the next search direction. Instead, we leverage a window of the following m 2nev residuals computed by the unrestarted CG to extend the search space. The linear system convergence is unaffected. The proposed algorithm, eigcg, can be implemented on top of CG in an efficient way that does not require extra matrix-vector multiplications or long vector orthogonalizations, and that is fully parallelizable. For details the reader is referred to [1]. Once a set of approximate eigenvectors U have been computed, we can deflate subsequent CG runs by using the init-cg approach [13]. Given an initial guess x, we consider another guess whose U components are removed through the following Galerkin oblique projection: x = x + U(U H AU) 1 U H (b A x ). (2) Then x is passed as initial guess to CG. The init-cg converges similarly to a deflated system until the linear system error approaches the eigenvalue error in U. At that point, we restart init-cg, re-applying the initial deflation. One or two restarts are sufficient to obtain linear convergence, equaling the convergence of a linear system deflated with the accurate eigenvalues. Solving one linear system would rarely yield enough eigenvector approximations to effectively deflate subsequent systems. Our Incremental eigcg employs a simple outer scheme that calls eigcg for n 1 < s of the s right hand sides and accumulates the resulting nev approximate Ritz vectors from each run into a larger set, U. While solving each new system, this incremental scheme computes additional Ritz vectors and also improves the ones that have not sufficiently converged in previous systems. In all our experiments and with modest values of nev, eigcg has demonstrated the surprising property of converging to the nev targeted eigenvectors with a convergence rate identical to 2

Convergence of 4 successive linear systems Incremental eigcg on the first 24, then Init CG with restarting at 5e 5 Case: 24 3 64. Mass =.418 1 2 3 4 5 6 Number of iterations (matrix vector operations) Figure 1. Improving convergence for the first 24 linear systems with Incremental eigcg(1,1) (blue). After the first 24 systems, we use the 24 vectors in U to deflate init-cg for another 16 vectors (red). Restarting is not used during the Incremental part, hence the knee in convergence curves. Restarting once at 5E-5 restores linear convergence. This is the 1 million matrix with a quark mass very close to the critical mass which causes the ill conditioning. The majority of the systems are solved with a speedup of 8. that of unrestarted Lanczos. For example, a parameter nev = 8 has been sufficient for this behavior, while increasing m (the window size) offers only computational, not convergence, benefits. Several experiments in [1] study various aspects of this behavior. 3. Speeding up linear systems in QCD We have implemented our algorithms in C and interfaced with Chroma, a lattice QCD C++ software base developed at Jefferson National Lab [14]. We have run both on an 8 node dual socket, dual core cluster with 4GB of memory per node, and on 256 processors of the Cray XT4 at NERSC, Lawerence Berkeley National Lab. Our computations involve all-to-all propagators which require the solution of several hundreds of right hand sides. Thus, we can afford to run Incremental eigcg on the first 24-32 systems accumulating about 25 vectors in U. For the rest of the systems, we use this U to deflate the init-cg as we described earlier. The Dirac matrices used in this paper come from two ensembles of an anisotropic, two flavor dynamical Wilson fermion calculation. In both cases the sea pion mass is about 4(36) MeV and the lattice spacing is.18(7)fm. The anisotropy factor of the time direction is about 3 and the critical mass was determined to be -.4188 in all cases. We consider three typical lattice sizes: one 16 3 64 and with the 12 variables per point the matrix size is N = 3, 145, 728; one 24 3 64 for a matrix size of N = 1, 616, 832; one 32 3 96 for a matrix size of N = 37, 748, 736. Figure 1 shows a typical convergence behavior for Incremental eigcg for the 1 million matrix and a quark mass close to critical. The more ill-conditioned a matrix is, the more iterations it requires initially, so eigcg has enough time to identify more eigenvalues and more accurately. 3

Matrix size: 16 16 16 64 12 Matrix size: 24 24 24 64 12 3 m critical Undeflated CG init CG with 24 vecs 5 Undeflated CG init CG with 24 vecs 25 2 15 sea quark mass 4 3 m critical sea quark mass 1 2 5 1.425.42.415.41.45.4.42.415.41.45.4 Figure 2. For each quark mass and its corresponding matrix we plot the average number of iterations required by init-cg to solve the rest 24 systems. For comparison the number of iterations of the non-deflated CG is reported. Critical slowdown is removed. 3.1. Removing critical slowdown in QCD As the quark mass approaches m critical, conditioning of the system rapidly deteriorates. However, shifting cannot change the density of the eigenvalues. Ill conditioning is simply due to a small number of eigenvalues approaching zero. Therefore if this small number of eigenvalues is captured and deflated, the conditioning of the matrix should be independent of the shift (quark mass). It is for this reason that our Incremental eigcg successfully removes the critical slowdown that has long plagued calculations in this field. Figure 2 shows the effects of the critical slowdown in the original, undeflated CG and how our method takes about the same number of iterations for any shift up to very close to the critical mass. 3.2. Scaling with volume For the same configuration, increasing the lattice volume increases the matrix condition number. This volume scaling problem is much harder to address with deflation methods than critical slowdown because the eigenvalue density increases. To achieve a constant condition number, we need to capture the same portion of the spectrum which means computing more eigenvalues. Figure 3 shows the effect of 25 vector deflation on CG convergence for the three different size lattices. Note that deflated CG takes about 2.9 times more iterations on the largest lattice than on the smallest one, even though the volume increased by a factor of 12. By running Incremental eigcg with more systems we can remove this volume dependency at the expense of storing more vectors. Figure 4 shows the effect of deflating with 256, 512 and 124 vectors on the largest lattice. We can achieve the same number of CG iterations on the large lattice as by deflating with about 5 times more vectors. However, there are diminishing returns in time. 4. EigBICG: extending to the nonsymmetric case Recently, we extended the eigcg idea to eigbicg which is based on the nonsymmetric Lanczos. The main difference is that we need to keep two windows, W and V, corresponding to the left and right residuals built by BICG. We restart these windows similarly to (1) using left and right Ritz vectors. An Incremental eigbicg outer method accumulates the 2nev vectors to improve accuracy of the eigenspace, which is then used to deflated the BICGSTAB method. Figure 5 shows the deflation improvements on two smaller lattices in Matlab. Although eigbicg 4

Convergence of undeflated CG (blue) versus init CG (red) after Incremental eigcg on 24 systems. Init CG restarted at 2.5e 5 Convergence of undeflated CG (blue) versus init CG (red) after Incremental eigcg on 24 systems. Init CG restarted at 5e 5 Case: 16 3 64. Mass =.4125 Case: 24 3 64. Mass =.4125 2 4 6 8 1 12 14 16 Number of iterations 5 1 15 2 25 3 Number of iterations Convergence of undeflated CG (blue) versus init CG (red) after Incremental eigcg(8,6) on 32 systems. Init CG restarted at 1e 6 Case: 32 3 96. Mass =.4125 5 1 15 2 Number of iterations Figure 3. CG convergence improvement by deflating 24 vectors obtained through Incremental eigcg, for three lattices with increasing volumes. Volume slowdown is reduced but not removed. CG iterations undeflated vectors deflated Case 24 256 512 124 3M 1628 215 1M 197 337 37M 2135-624 415 278 Figure 4. Left graph shows the time to solve the 78 linear systems after solving the first 32, 64, 128 systems with Incremental eigcg. Right table shows the CG iterations each case took. 5

converges similarly to unrestarted Lanczos, the Incremental part is less effective than in the symmetric case. Still, the nonsymmetric version is useful if the experiments involve much heavier quark masses as shown in the figure. 1 4 1 2 QCD even odd, quenched Wilson, β=5.5, 8 4, m q = 1.25 nev=3, m=8, n1=1 BICGSTAB Deflated BICGSTAB, LR, DefTol= Deflated BICGSTAB, Galerkin, DefTol=1 1 1 6 1 4 QCD even odd, quenched Wilson, β=5.8, 12 4, m q =.95 nev=3, m=8, n1=1 BICGSTAB Deflated BICGSTAB, LR, DefTol= Deflated BICGSTAB, Galerkin, DefTol=1 1 Residual Norm Residual Norm 1 2 1 8 5 1 15 2 Iteration Number 9 8 7 6 5 4 3 Case: 49K undeflated CG init CG with 2 evecs undeflated bicgstab init bicgstab with 3 vectors 1 2 3 4 5 Iteration Number 9 8 7 6 5 4 3 Case: 249K undeflated CG init CG with 24 evecs undeflated bicgstab init bicgstab with 3 vectors 2 2 1 1 1.3 1.25 1.2 1.15 1.1 1.5 1.96.94.92.9.88.86.84 Figure 5. Deflated bicgstab using eigenspace from eigbicg and comparison with eigcg. 5. Conclusions We have outlined a successful method, eigcg, that has helped us speedup dramatically the linear system solves in QCD and remove the critical slowdown. Volume slowdown is reduced but we have not yet reached lattice sizes where it will become the bottleneck. We have also presented a promising extension to the nonsymmetric case. Acknowledgments This work was supported by the NSF grant CCF-728915, the DOE Jefferson Lab and the Jeffress Memorial Trust grant J-813. References [1] Wilson K G 1974 Phys. Rev. D1 2445 2459 [2] Foley J, Juge K J, O Cais A, Peardon M, Ryan S and Skullerud J I 25 Comput. Phys. Commun. 172 145 162 (Preprint hep-lat/5523) [3] Simoncini V and Gallopoulos E 1995 SIAM J. Sci. Comput. 16 917 933 [4] Chan T F and Wan W 1997 SIAM J. Sci. Comput. 18 1698 1721 [5] O Leary D P 198 Lin. Alg. Appl. 29 293 322 6

[6] de Forcrand P 1996 Nucl. Phys. B (Proc. Suppl.) 47 228 235 [7] Morgan R B 22 SIAM J. Sci. Comput. 24 2 37 [8] Morgan R and Wilcox W 24 Deflated iterative methods for linear equations with multiple right-hand sides Tech. Rep. BU-HEPP-4-1 Baylor University [9] Wang S, de Sturler E and Paulino G H 27 International Journal for Numerical Methods in Engineering 69 2441 2468 [1] Stathopoulos A and Orginos K June 12, 28 (original version July 1, 27) Computing and deflating eigenvalues while solving multiple right hand side linear systems with an application to quantum chromodynamics Tech. Rep. arxiv.org 77.131v2 http://arxiv.org/pdf/77.131v2, Revised version under review in SIAM J. Sci. Comput. [11] Saad Y 23 Iterative methods for sparse linear systems (Philadelphia, PA, USA: SIAM) [12] Stathopoulos A 27 SIAM Journal on Scientific Computing 29 481 514 [13] Saad Y, Yeung M, Erhel J and Guyomarc h F 2 SIAM J. Sci. Comput. 21 199 1926 [14] Edwards R G and Joo B (SciDAC) 25 Nucl. Phys. Proc. Suppl. 14 832 (Preprint hep-lat/493) 7