State-of-the-art numerical solution of large Hermitian eigenvalue problems. Andreas Stathopoulos

Similar documents
Deflation for inversion with multiple right-hand sides in QCD

College of William & Mary Department of Computer Science

College of William & Mary Department of Computer Science

Progress Report for Year 2. College of William and Mary Algorithms and software

Is there life after the Lanczos method? What is LOBPCG?

College of William & Mary Department of Computer Science

This manuscript is for review purposes only.

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Controlling inner iterations in the Jacobi Davidson method

Iterative methods for symmetric eigenvalue problems

JADAMILU: a software code for computing selected eigenvalues of large sparse symmetric matrices

CONTROLLING INNER ITERATIONS IN THE JACOBI DAVIDSON METHOD

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

A CHEBYSHEV-DAVIDSON ALGORITHM FOR LARGE SYMMETRIC EIGENPROBLEMS

Lecture 3: Inexact inverse iteration with preconditioning

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS

Preconditioned inverse iteration and shift-invert Arnoldi method

Alternative correction equations in the Jacobi-Davidson method

CONTROLLING INNER ITERATIONS IN THE JACOBI DAVIDSON METHOD

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Iterative Methods for Linear Systems of Equations

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

DELFT UNIVERSITY OF TECHNOLOGY

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

ETNA Kent State University

Inexact inverse iteration with preconditioning

Preface to the Second Edition. Preface to the First Edition

Domain decomposition on different levels of the Jacobi-Davidson method

APPLIED NUMERICAL LINEAR ALGEBRA

Numerical Methods in Matrix Computations

6.4 Krylov Subspaces and Conjugate Gradients

Multigrid absolute value preconditioning

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Iterative methods for Linear System

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

FEM and sparse linear system solving

arxiv: v1 [hep-lat] 2 May 2012

Inexact Inverse Iteration for Symmetric Matrices

Course Notes: Week 1

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

Arnoldi Methods in SLEPc

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 11 : JACOBI DAVIDSON METHOD

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Linear Solvers. Andrew Hazel

EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computer Science. CASA-Report November 2013

c 2009 Society for Industrial and Applied Mathematics

Davidson Method CHAPTER 3 : JACOBI DAVIDSON METHOD

ABSTRACT OF DISSERTATION. Ping Zhang

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems

DELFT UNIVERSITY OF TECHNOLOGY

M.A. Botchev. September 5, 2014

An Asynchronous Algorithm on NetSolve Global Computing System

Implementation of a preconditioned eigensolver using Hypre

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Universiteit-Utrecht. Department. of Mathematics. Jacobi-Davidson algorithms for various. eigenproblems. - A working document -

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

The Conjugate Gradient Method

Contents. Preface... xi. Introduction...

Scientific Computing: An Introductory Survey

WHEN studying distributed simulations of power systems,

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator

Iterative Methods for Sparse Linear Systems

Inexact inverse iteration for symmetric matrices

Polynomial optimization and a Jacobi Davidson type method for commuting matrices

arxiv: v1 [hep-lat] 19 Jul 2009

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method

Solving Ax = b, an overview. Program

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Inexact Inverse Iteration for Symmetric Matrices

Alternative correction equations in the Jacobi-Davidson method. Mathematical Institute. Menno Genseberger and Gerard L. G.

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

A Jacobi-Davidson method for solving complex symmetric eigenvalue problems Arbenz, P.; Hochstenbach, M.E.

SOLVING MESH EIGENPROBLEMS WITH MULTIGRID EFFICIENCY

Chapter 7 Iterative Techniques in Matrix Algebra

LSTRS 1.2: MATLAB Software for Large-Scale Trust-Regions Subproblems and Regularization

TWO-GRID DEFLATED KRYLOV METHODS FOR LINEAR EQUATIONS

Iterative Methods for Solving A x = b

Krylov Subspaces. Lab 1. The Arnoldi Iteration

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Studies on Jacobi-Davidson, Rayleigh Quotient Iteration, Inverse Iteration Generalized Davidson and Newton Updates

1. Introduction. In this paper we consider the large and sparse eigenvalue problem. Ax = λx (1.1) T (λ)x = 0 (1.2)

Solving Sparse Linear Systems: Iterative methods

Transcription:

State-of-the-art numerical solution of large Hermitian eigenvalue problems Andreas Stathopoulos Computer Science Department and Computational Sciences Cluster College of William and Mary Acknowledgment: National Science Foundation

The problem FindnumEvals eigenvalues λ i and corresponding eigenvectors x i A x i = λ i x i, i = 1 :numevals A is large, sparse, symmetric N = O(10 6 10 8 ) Applications: materials, structural, data mining, SVD, QCD,... QCD Accelerate linear systems with multiple right hand sides Low rank approximation of matrices Only possible through iterative methods [ 2 ]

Power method: the fundamental iterative method Given initial guess v 0, the iteration for i = 1,2,... t = Av i 1 v i = t/ t converges to the largest modulus eigenpair ( λ N, x N ), i.e., A i v 0 A i v 0 x N, with rate λ N 1 λ N + Requires only matrix-vector multiplications Only for largest eigenpair Slow! [ 3 ]

Krylov methods: the prevailing technique Krylov space consists of the span of all power iterates: K m,v = span { v,av,a 2 v,...,a m 1 v } = {p(a)v : p polynomial of degree < m} Compute V an orthonormal basis for K m,v (for numerical stability) Compute approximations through Rayleigh-Ritz: x i = V y i, where V T AV y i = λ i y i Arnoldi: the above process for non-symmetric matrices Lanczos: a special case of Arnoldi for symmetric matrices [ 4 ]

Krylov methods: the prevailing technique K m,v = span { v,av,a 2 v,...,a m 1 v } = {p(a)v : p polynomial of degree < m} + Approximating extreme eigenpairs + Converges trivially in N steps + Optimal approximations over all polynomials Convergence rate depends on relative separation of eigenvalues Slow for clustered eigenvalues and large sizes O(Nm 2 ) orthogonalization cost, O(mN) storage [ 5 ]

Krylov ideal for linear systems Ax = b Conjugate Gradient (CG) uses a 3-term recurrence to build K m,v and update the approximate solution. O(Nm) cost and O(3N) storage minimizes error A at every step Preconditioning with M 1 A 1 also easy: M 1 Ax = M 1 b (PCG) [ 6 ]

Krylov ideal for linear systems Ax = b Conjugate Gradient (CG) uses a 3-term recurrence to build K m,v and update the approximate solution. O(Nm) cost and O(3N) storage minimizes error A at every step Preconditioning with M 1 A 1 also easy: M 1 Ax = M 1 b (PCG) Note: The action of M 1 could be an iterative method itself! [ 7 ]

Lanczos problems Lanczos 3-term recurrence still requires O(Nm) storage Unlike CG, orthogonality is important in Lanczos Restarting to limit the basis size destroys optimality Preconditioning is not obvious (M 1 Ax = λm 1 x not an eigenproblem) Goal: Use PCG to derive nearly optimal eigenmethods with smaller bases [ 8 ]

Generalized Davidson: Eigenvalue Preconditioning Let r = Ax λx the residual of an approximate eigenpair (λ,x) Arnoldi/Lanczos: expand basis V by r. Generalized Davidson: expands by the preconditioned r: Generalized Davidson repeat V = [V, M 1 r] V AV y = λy, x = V y x = x/ x r = Ax λx until r < ε append preconditioned residual Rayleigh Ritz normalize new residual No 3-term recurrence, more expensive step, but much faster convergence [ 9 ]

Inverse iteration (inverse power method) Given initial guess v 0, the iteration for i = 1,2,... t = (A σi) 1 v i 1 v i = t/ t converges to the eigenpair closest to σ + The closer σ is to λ k the faster the outer convergence rate λ k σ λ k 1 σ A direct factorization of A may be prohibitive An iterative method for the linear system may take long to converge [ 10 ]

Rayleigh Quotient Iteration Given initial guess v 0 : for i = 1,2,... t = (A σi) 1 v i 1 v i = t/ t σ = v T i 1 Av i 1 All the characteristics of Inverse Iteration but also: + converges to the eigenpair cubically!! if v 0 not close to the required eigenvector it may misconverge [ 11 ]

Rayleigh Quotient Iteration Given initial guess v 0 : for i = 1,2,... t = (A σi) 1 v i 1 v i = t/ t σ = v T i 1 Av i 1 All the characteristics of Inverse Iteration but also: + converges to the eigenpair cubically!! if v 0 not close to the required eigenvector it may misconverge Eigenproblem: constrained minimization of Rayleigh quotient λ = xt Ax x T x RQI equivalent to Newton on the unit-sphere manifold [ 12 ]

Inexact RQI, Newton, and Jacobi-Davidson (A σi)t = v i 1 must be solved accurately enough for RQI to converge However, inexact (truncated) Newton does not require high accuracy Newton: x i+1 = x i Hess(x i ) 1 (x i ) computes correction RQI: x i+1 = (A σi) 1 x i updates approximation inexact RQI not exactly inexact Newton! [ 13 ]

Inexact RQI, Newton, and Jacobi-Davidson (A σi)t = v i 1 must be solved quite accurately for RQI to converge However, inexact (truncated) Newton does not require high accuracy Newton: x i+1 = x i Hess(x i ) 1 (x i ) computes correction RQI: x i+1 = (A σi) 1 x i updates approximation inexact RQI not exactly inexact Newton! Note (x) = r = (Ax λx) the residual of (λ,x). Thus the correction δ to x: Jacobi-Davidson: (I xx T )(A ηi)(i xx T )δ = r computes correction JD inexact (truncated) Newton [ 14 ]

Equivalence in the general case Let M A σi a preconditioner Both GD/JD solve approximately the correction equation: Generalized Davidson as δ = M 1 r Jacobi Davidson as δ = M 1 x r Both GD/JD not single vector iterations, they build a space! GD, JD subspace-accelerated inexact Newton [ 15 ]

A different view: Quasi-Newton approaches Mild non-linearity: Nonlinear CG is competitive [Bradbury & Fletcher, 66, Others] Better: locally optimal LOBPCG [D yakonov 83, Knyazev, 91, 01] x i+1 = Rayleigh Ritz ( ) x i 1, x i, M 1 r i [ 16 ]

A different view: Quasi-Newton approaches Mild non-linearity: Nonlinear CG is competitive [Bradbury & Fletcher, 66, Others] Better: locally optimal LOBPCG [D yakonov 83, Knyazev, 91, 01] x i+1 = Rayleigh Ritz ( ) x i 1, x i, M 1 r i Subspace acceleration and recurrence restarting in GD [Murray et al., 92] GD(k,m)+1: Restart with [x i 1, xi 1,...,xi k ] [AS 98, 99] Direct analogy to limited memory quasi Newton methods: GD+1 accelerates LOBCPG Broyden accelerates Nonlinear CG [ 17 ]

So what is optimal? Work unit: Matrix-Vector RQI Newton inner iterations Total MVs * Lanczos QMRopt Optimal: Unrestarted Lanczos or QMRopt, QMR solving (A λi)x = 0 [ 18 ]

So what is optimal? Work unit: Matrix-Vector Inexact RQI Jacobi Davidson Truncated Newton RQI Newton inner iterations Total MVs * Lanczos QMRopt [ 19 ]

So what is optimal? Work unit: Matrix-Vector IRA (m=2) Inexact RQI Jacobi Davidson RQI Steepest Descent Truncated Newton Newton inner iterations Total MVs * Lanczos QMRopt [ 20 ]

So what is optimal? Work unit: Matrix-Vector IRA (m=2) Inexact RQI Jacobi Davidson RQI Steepest Descent Truncated Newton Newton inner iterations Total MVs * * Lanczos QMRopt Nonlinear CG DACG LOBPCG memory [ 21 ]

So what is optimal? Work unit: Matrix-Vector IRA (m=2) Inexact RQI Jacobi Davidson RQI Steepest Descent Truncated Newton Newton inner iterations Total MVs * * Lanczos QMRopt Nonlinear CG DACG LOBPCG Limited memory quasi Newton Generalized Davidson+1(m) memory [ 22 ]

So what is optimal? Work unit: Matrix-Vector IRA (m=2) Inexact RQI Jacobi Davidson RQI Steepest Descent Truncated Newton Newton inner iterations Total MVs * All these methods can be implemented as JD with different parameters * Lanczos QMRopt Nonlinear CG DACG LOBPCG Limited memory quasi Newton Generalized Davidson+1(m) memory [ 23 ]

Stopping the inner JD iteration (Notay 02) (I xx T )(A ηi)(i xx T )δ = r r eigen 2 = r Linear 2 / f + g k 2 0 Inverse iteration residual Eigenvalue residual vs linear system residual 10 2 Inner iteration run to convergence 10 4 eigenvalue residual linear system residual 10 6 10 8 10 10 0 50 100 150 200 Matrix vector products [ 24 ]

Our JDQMR extension to JDCG Based on symmetric QMR [Freund & Nachtigal 94] with right preconditioning JDQMR new features 1. Can use indefinite preconditioners 2. Works for interior eigenpairs 3. Residual convergence smooth 4. Better stopping criteria JDQMR improves robustness and efficiency [ 25 ]

JDQMR reduces wasted iterations 0 2 4 6 BCSSTK09 Jacobi Davidson with jmin=8, jmax=15, residual tolerance 1e 10. sqmr on (A λ 1 I)x = 0 JDqmr with dynamic stopping criterion 10 2 BCSSTK09 JDqmr with dynamic stopping criterion 10 4 10 6 Eigenvalue and linear system residuals very close 78 80 8 10 8 10 10 10 12 0 20 40 60 80 100 0 50 100 150 200 250 Matrix vector products Matrix vector products [ 26 ]

One eigenvalue with preconditioning NASASRB: Note the plateaus NASASRB: Matvecs of preconditioned methods 10 4 10 2 Residual norm 10 0 10 2 QMRopt GDfull Lobpcg 10 4 GD+1 JDQMR JDCG 10 6 0 50 100 150 200 250 300 350 400 Matvecs [ 27 ]

Sample runs GD+k: minmatvecs JDQMR: mintime 1 smallest eigenvalue of 2 for large matrices 4500 4000 3500 GD+1 JDqmr BLOPEX ARPACK Matvecs for 1 smallest eval 250 200 Time in seconds for 1 smallest eval GD+1 JDqmr BLOPEX ARPACK 16000 14000 12000 3000 2500 2000 1500 150 100 10000 8000 6000 1000 50 4000 500 2000 0 1M ILUT(4,0) 1M no prec 10M no prec Dimension of Laplacian 0 1M ILUT(4,0) 1M no prec 0 10M no prec ARPACK for 1M: 525 Matvecs, 220 seconds [ 28 ]

What is optimal for many eigenvalues? Red: QMRopt (exact eigenvalues as shifts) Black: JDQMR nearly optimal Blue: subspace accelerated GD+1 better than optimal? 4 494BUS Residual convergence versus matvecs: Black: JDqmr, Red: JDqmr with exact eigenvalue shifts, Blue: JD+1 2 0 2 4 6 8 10 12 0 50 100 150 200 250 Matrix vector multiplications [ 29 ]

JD: projecting the locked vectors X projecting the Ritz vector x (I XX T ) (A ηi)k 1 ( I X(X T K 1 X) 1 X T K 1) t = r Left projection needed for definiteness Skew Right projection is it needed? Orthogonalization requirements dominate for nev, so Solve: (A ηi) t = r, w/o preconditioning (I XX T )(A ηi)k 1 t = r, with preconditioning Other choices JDQMR-(Left,Skew,Right) (111), (000), (101), (011), (100) [ 30 ]

Scaling numevals no preconditioner Laplace 7point, 125K, Tol = 1e-15 Matvecs x 10 4 4.5 4 3.5 3 2.5 2 1.5 arpack GD+2 jdqmr000 jdqmr100 jdqmr011 jdbsym Matrix: Lap7pt125K tol = 1E 15 Time in seconds 4500 4000 3500 3000 2500 2000 1500 arpack GD+2 jdqmr000 jdqmr100 jdqmr011 jdbsym Matrix: Lap7pt125K tol = 1E 15 1 1000 0.5 500 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found JDQMR-000 fastest among all PRIMME variants and ARPACK [ 31 ]

Scaling numevals no preconditioner cfd1, 70K, 26 nonzeros/row Tol = 1e-15 Matvecs x 10 5 2.5 2 1.5 1 arpack GD+2 jdqmr000 jdqmr100 jdqmr011 jdbsym Matrix: cfd1 tol = 1E 15 Time in seconds 18000 16000 14000 12000 10000 8000 6000 arpack GD+2 jdqmr000 jdqmr100 jdqmr011 jdbsym Matrix: cfd1 tol = 1E 15 0.5 4000 2000 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found ARPACK eventually better for large numevals and denser matrices [ 32 ]

Scaling numevals no preconditioner Ratio: ARPACK / JDQMR-000 for 8 matrices Matvec ratio ARPACK/JDQMR 000, tol = 1E 15 Time ratio ARPACK/JDQMR 000, tol = 1E 15 Matvec ratio 2 1 0.5 Cone_A Andrews cfd1 Plate33K_A0 finan512 Lap7pt15K Lap7pt125K torsion1 Time ratio 4 3.5 3 2.5 2 1.5 Cone_A Andrews cfd1 Plate33K_A0 finan512 Lap7pt15K Lap7pt125K torsion1 0.1 1 0.5 0.05 5 10 50 100 Number of smallest eigenvalues found 0 5 10 50 100 Number of smallest eigenvalues found JDQMR-000 faster fornumevals < 10. Asymptotically depends on sparsity [ 33 ]

Scaling numevals ILUT(80,1e-4) cfd1 70K, Tol = 1e-15 Matrix: cfd1 tol = 1E 15 Matrix: cfd1 tol = 1E 15 Precond/Matvecs 12000 10000 8000 6000 4000 arpack GD+1 jdqmr100 jdqmr011 jdqmr111 jdbsym Time in seconds 3500 3000 2500 2000 1500 1000 arpack GD+1 jdqmr100 jdqmr011 jdqmr111 jdbsym 2000 500 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found Q-projectors have no effect convergence of JDQMR [ 34 ]

Scaling numevals ILUT(20,1e-6) Laplace 7point, 125K, Tol = 1e-15 Matrix: Lap7pt125K tol = 1E 15 Matrix: Lap7pt125K tol = 1E 15 Precond/Matvecs 7000 6000 5000 4000 3000 arpack GD+1 jdqmr100 jdqmr011 jdqmr111 jdbsym Time in seconds 1600 1400 1200 1000 800 600 arpack GD+1 jdqmr100 jdqmr011 jdqmr111 jdbsym 2000 400 1000 200 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found 0 0 10 20 30 40 50 60 70 80 90 100 Number of smallest eigenvalues found Expensive preconditioner fewer MVs means faster (GD+1) [ 35 ]

Software availability PRIMME: PReconditioned Iterative MultiMethod Eigensolver with my Ph.D. student J.R. McCombs Full set of defaults for non expert users Full customizability for expert users Near optimality through GD+k and JDQMR Over 12 methods accessible through PRIMME. Parallel, high performance implementation C and Fortran interfaces, Matlab interface soon. Download: www.cs.wm.edu/ andreas [ 36 ]

Minimal interface End user #include "primme.h" primme_params primme; primme_initialize(&primme); primme.n = n; primme.numevals = 20; primme.matrixmatvec = MV(x,y,k) primme.applypreconditioner = PR(x,y,k) primme_set_method(method, &primme); ierr = dprimme(evals, evecs, rnorms, &primme); [ 37 ]

Minimal interface End user #include "primme.h" primme_params primme; primme_initialize(&primme); primme.n = n; primme.numevals = 20; The matrix and its size have been read. Number of needed eigenvalues, smallest by default primme.matrixmatvec = MV(x,y,k) primme.applypreconditioner = PR(x,y,k) primme_set_method(method, &primme); ierr = dprimme(evals, evecs, rnorms, &primme); [ 38 ]

Minimal interface End user #include "primme.h" primme_params primme; primme_initialize(&primme); primme.n = n; primme.numevals = 20; primme.matrixmatvec = MV(x,y,k) primme.applypreconditioner = PR(x,y,k) Pointers to functions for block matrix-vectors, and block precondition-vectors primme_set_method(method, &primme); ierr = dprimme(evals, evecs, rnorms, &primme); [ 39 ]

Minimal interface End user #include "primme.h" primme_params primme; primme_initialize(&primme); primme.n = n; primme.numevals = 20; primme.matrixmatvec = MV(x,y,k) primme.applypreconditioner = PR(x,y,k) primme set method(method, &primme); CHOICES: DYNAMIC DEFAULT MIN TIME DEFAULT MIN MATVECS Arnoldi GD GD plusk GD Olsen plusk JD Olsen plusk RQI JDQR JDQMR JDQMR ETol SUBSPACE ITERATION LOBPCG OrthoBasis LOBPCG OrthoBasis Window ierr = dprimme(evals, evecs, rnorms, &primme); [ 40 ]

The full interface Advanced user #include "primme.h" primme_params primme; primme. outputfile = stdout printlevel = 5 numevals = 10 anorm = 1.0 eps = 1.0e-12 maxbasissize = 15 minrestartsize = 7 maxblocksize = 1 maxouteriterations = 10000 maxmatvecs = 300000 target = primme_smallest numtargetshifts = 0 targetshifts = 1.0 2.0 locking = 1 initsize = 0 numorthoconst = 0; iseed = -1 restarting.scheme = primme_thick restarting.maxprevretain = 1 correction.precondition = 1 correction.robustshifts = 1 correction.maxinneriterations = -1 correction.reltolbase = 1.5 correction.convtest = adaptive_etolerance correction.projectors.leftq = 1 correction.projectors.leftx = 1 correction.projectors.rightq = 0 correction.projectors.skewq = 0 correction.projectors.rightx = 1 correction.projectors.skewx = 1 matrixmatvec = MV(x,y,k) applypreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme); [ 41 ]

Minimal Fortran interface include primme_f77.h integer primme call primme_initialize_f77(primme) call primme_set_member_f77(primme, PRIMMEF77_n, n) call primme_set_member_f77(primme, PRIMMEF77_numEvals, 20) call primme_set_member_f77(primme, PRIMMEF77_matrixMatvec, MV) call primme_set_member_f77(primme, PRIMMEF77_applyPreconditioner,PR) call primme_set_method_f77(primme, method, bytesneeded) call dprimme_f77(evals, evecs, rnorms, primme, ierr) Similar to C #include "primme.h" primme_params primme; primme_initialize(&primme); primme.n = n; primme.numevals = 20; primme.matrixmatvec = MV; primme.applypreconditioner = PR; primme_set_method(method, &primme); ierr = dprimme(evals, evecs, rnorms, &primme); [ 42 ]

In QCD we solve a sequence of linear systems Can we use these Krylov spaces to 1. obtain eigenpairs? 2. use these eigenpairs to deflate and thus accelerate subsequent systems? For restarted GMRES(m), the variant GMRESDR(m) IRA(m) GMRESDR computes eigenvalues while solving the system GMRES expensive per iteration Restarting slows convergence for linear system AND eigenvectors Can we be more effective on CG/Lanczos? [ 43 ]

Our recent work (with K. Orginos) Small window of m vectors, V, keeps track of the smallest nev < m eigenvectors V is expanded by the CG residuals When m vectors in V, restart it as in GD(nev,m)+nev CG iterates unaffected Records the Lanczos vector contributions to eigenvectors [ 44 ]

Does it find eigenvalues accurately? 10 0 Convergence of lowest eigenresidual with different (nev,m) Norm of lowest eigenvalue residual 10 1 10 2 10 3 10 4 10 5 10 6 eigcg(1,3) eigcg(3,9) eigcg(3,40) eigcg(8,17) eigcg(8,20) eigcg(8,40) unrestarted Lanczos 0 200 400 600 800 1000 1200 1400 1600 CG iterations [ 45 ]

Does this find eigenvalues accurately? Identical to Lanczos Residual convergence of 8 smallest eigenpairs with eigcg(8,20) 1 10 10 0 Eigenresidual Norm 10 1 10 2 10 3 10 4 10 5 10 6 0 200 400 600 800 1000 1200 1400 1600 CG iterations [ 46 ]

Incrementally improving accuracy and number of eigenvalues Use the CG iterations for k subsequent RHS to improve U: Incremental eigcg U = [ ],Λ = [ ] // accumulated eigenpairs for i = 1 : k x 0 = UΛ 1 U H b i // the init-cg part [x i,v,m ] = eigcg(nev,m,a,x 0,b i ) // eigcg with initial guess x 0 [U, Λ] = RayleighRitz([U,V]); end Typical values: k = 100, k = 12 24, nev = 10, m = 40 [ 47 ]

Convergence improves after every new CG 10 3 10 0 10-3 r 2 10-6 10-9 10-12 0 500 1000 1500 CG iteration [ 48 ]

A realistic experimental case Lattice parameters: 2 flavor Wilson fermions Lattice spacing a s = 0.1 f m (spatial) anisotropic: a t = a s /3 pion mass ( 350-400 MeV ) Two lattice sizes: 16 3 64 for a matrix dimension of 3.1 million 24 3 64 for a matrix dimension of 10.6 million Currently running using Chroma at Jefferson Lab [ 49 ]

Case: 3.1 million Case: 16 16 16 64 12 3000 m π L < 3 Original CG init CG with 240 evecs 2500 sea quark mass Matvecs 2000 1500 1000 500 0 0.425 0.42 0.415 0.41 0.405 0.4 Quark mass [ 50 ]

Case: 10.6 million 5000 m π L < 3 Case: 24 24 24 64 12 Original CG init CG with 240 evecs 4000 sea quark mass Matvecs 3000 2000 1000 0 0.42 0.415 0.41 0.405 0.4 Quark mass [ 51 ]

Conclusions JDQMR subspace accelerated inexact (truncated) Newton GD+1 subspace accelerated quasi Newton Near optimal for just a few eigenpairs Cheaper projectors possible with JDQMR for many eigenvalues PRIMME a state of the art eigensolver Our recent research promising for optimal, limited memory eigensolver [ 52 ]