Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Similar documents
Is there life after the Lanczos method? What is LOBPCG?

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Preconditioned Eigensolver LOBPCG in hypre and PETSc

Implementation of a preconditioned eigensolver using Hypre

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures. Stanimire Tomov*

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Large scale ab initio calculations based on three levels of parallelization

Fast Eigenvalue Solutions

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

STEEPEST DESCENT AND CONJUGATE GRADIENT METHODS WITH VARIABLE PRECONDITIONING

State-of-the-art numerical solution of large Hermitian eigenvalue problems. Andreas Stathopoulos

Using the Karush-Kuhn-Tucker Conditions to Analyze the Convergence Rate of Preconditioned Eigenvalue Solvers

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

The Conjugate Gradient Method

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

Arnoldi Methods in SLEPc

Linear Solvers. Andrew Hazel

arxiv: v1 [cs.ms] 18 May 2007

c 2009 Society for Industrial and Applied Mathematics

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

Intro to ab initio methods

Multigrid absolute value preconditioning

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

The Abinit project. Coding is based on modern software engineering principles

ETNA Kent State University

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

FEM and sparse linear system solving

Iterative methods for Linear System

6.4 Krylov Subspaces and Conjugate Gradients

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Iterative methods for symmetric eigenvalue problems

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Numerical Methods in Matrix Computations

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set

Density Functional Theory. Martin Lüders Daresbury Laboratory

Iterative Diagonalization in Augmented Plane Wave Based Methods in Electronic Structure Calculations

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Notes on PCG for Sparse Linear Systems

Algorithms and Computational Aspects of DFT Calculations

Numerical Methods I Non-Square and Sparse Linear Systems

SOLVING MESH EIGENPROBLEMS WITH MULTIGRID EFFICIENCY

Numerical Optimization

Preconditioned inverse iteration and shift-invert Arnoldi method


PRECONDITIONED ITERATIVE METHODS FOR LINEAR SYSTEMS, EIGENVALUE AND SINGULAR VALUE PROBLEMS. Eugene Vecharynski. M.S., Belarus State University, 2006

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

APPLIED NUMERICAL LINEAR ALGEBRA

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

PRECONDITIONING ORBITAL MINIMIZATION METHOD FOR PLANEWAVE DISCRETIZATION 1. INTRODUCTION

Lecture 17: Iterative Methods and Sparse Linear Algebra

Deflation for inversion with multiple right-hand sides in QCD

Iterative Methods for Solving A x = b

Practical Guide to Density Functional Theory (DFT)

Domain decomposition on different levels of the Jacobi-Davidson method

Math 411 Preliminaries

Solving PDEs with Multigrid Methods p.1

Improving the Convergence of Back-Propogation Learning with Second Order Methods

From Stationary Methods to Krylov Subspaces

A GEOMETRIC THEORY FOR PRECONDITIONED INVERSE ITERATION APPLIED TO A SUBSPACE

Numerical Optimization of Partial Differential Equations

ab initio Electronic Structure Calculations

Conjugate Gradients: Idea

AMS526: Numerical Analysis I (Numerical Linear Algebra)

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES

Conjugate Gradient (CG) Method

Solving Ax = b, an overview. Program

Parallel Eigensolver Performance on High Performance Computers 1

Numerical Methods - Numerical Linear Algebra

A CHEBYSHEV-DAVIDSON ALGORITHM FOR LARGE SYMMETRIC EIGENPROBLEMS

Krylov Space Solvers

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems

Parallel Eigensolver Performance on High Performance Computers

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

Block Krylov-Schur Method for Large Symmetric Eigenvalue Problems

Alternative correction equations in the Jacobi-Davidson method

Notes on Some Methods for Solving Linear Systems

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Programming, numerics and optimization

Preconditioning Techniques Analysis for CG Method

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

MARCH 24-27, 2014 SAN JOSE, CA

A Parallel Implementation of the Trace Minimization Eigensolver

Lecture 16: DFT for Metallic Systems

hypre MG for LQFT Chris Schroeder LLNL - Physics Division

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Finite-choice algorithm optimization in Conjugate Gradients

Time-dependent density functional theory (TDDFT)

Chapter 7 Iterative Techniques in Matrix Algebra

Iterative projection methods for sparse nonlinear eigenvalue problems

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

This ensures that we walk downhill. For fixed λ not even this may be the case.

Transcription:

1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder Symposium XVI May 26, 2005 Partially supported by the NSF and the LLNL

2 Abstract We describe the locally optimal block preconditioned conjugate gradient (LOBPCG) method in the framework of ABINIT and Vienna Ab-initio Simulation Package (VASP). Several methods are available in ABINIT/VASP to calculate the electronic ground state: simple Davidson-block iteration scheme, single band steepest descent scheme, conjugate gradient optimization, residual minimization method. LOBPCG can be interpreted as a conjugate gradient optimization, different from those used in ABINIT/VASP. It can also be viewed as the steepest descent scheme, augmented with extra vectors in the basis set, namely with the wave functions from the previous iteration step, not with the residuals as implemented in VASP. Finally, it can be seen as simplified specially restarted block Davidson method. We describe the LOBPCG and compare it with ABINIT/VASP algorithms.

3 Acknowledgements Partially supported by the NSF and the LLNL Center for Applied Scientific Computing. We would like to thank Rob Falgout, Charles Tong, Panayot Vassilevski and other members of the Hypre Scalable Linear Solvers LLNL project team for their support in implementing our LOBPCG eigensolver in Hypre. John Pask and Jean-Luc Fattebert of LLNL provided invaluable help in getting us involved in the electronic structure calculations. LOBPCG is implemented in ABINIT rev. 4.5 and above by G. Zérah.

4 1. Electronic Structure Calculations 2. The basics of ABINIT and VASP 3. VASP band by band preconditioned steepest descent 4. ABINIT/VASP band by band preconditioned conjugate gradients 5. Locally optimal band by band preconditioned conjugate gradients 6. VASP simple block Davidson (block preconditioned steepest descent) 7. ABINIT LOBPCG 8. Conclusions

5 Electronic Structure Calculations Quantum mechanics: a system of interacting electrons and nuclei described by the many-body wavefunction Ψ (with M N complexity, where M is the number of space points and N is the number of electrons), a solution to the Schrodinger equation HΨ = EΨ, where H is the Hamiltonian operator and E the total energy of the system. The Hamiltonian operator contains the kinetic operators of each individual electron and nuclei and all pair-wise Coulomb interactions. Density Functional Theory and Kohn and Sham: the many-body Schrodinger equation can be reduced to an effective one-particle equation: [ 2 + v eff (r) ] ψ i (r) = ǫ i ψ i (r), which depends on the electronic density N j=1 ψ j(r) 2 (self-consistent) with N occupied and mutually orthogonal single particle wave functions ψ i, M N total complexity.

6 http://www.abinit.org/ First-principles computation of material properties: the ABINIT software project, X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese, L. Sindic, M. Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami,Ph. Ghosez, J.-Y. Raty, D.C. Allan. Computational Materials Science 25, 478-492 (2002) http://cms.mpi.univie.ac.at/vasp/ Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set, G. Kresse and J. Furthmuller, Comput. Mat. Sci. 6, 15-50 (1996) http://math.cudenver.edu/ aknyazev/ A. V. Knyazev, Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23, no. 2, 517 541 (2001)

7 The basics of ABINIT and VASP ABINIT and Vienna Ab-initio Simulation Package (VASP) are complex packages for performing ab-initio quantum-mechanical molecular dynamics simulations using a plane wave (PW) basis set. Generally not more than 100 PW per atom are required to describe bulk materials, in most cases even 50 PW per atom will be sufficient for a reliable description. Multiplication by the Hamiltonian in the plain wave basis involves both Fourier (for the Laplacian) and real (for the potential)l spaces, thus DFFT is called twice per multiplication. Since the Laplacian becomes diagonal in the Fourier space, diagonal preconditioning is useful. Alternatively, real-space FEM and finite difference codes are available, e.g., Multigrid Instead of the K-spAce (MIKA). In real space approximations the number of basis functions is 10-100 times larger than that with plain waves, and sophisticated preconditioning (usually multigrid) is necessary.

8 The execution time scales like CMN 2 for some parts of the code, where M is the number of PW basis functions and N is the number of valence electrons in the system. The constant C is controlled by keeping the number of orthogonalizations small. For systems with roughly 2000 electronic bands, the CMN 2 part in VASP becomes comparable to other parts. Systems with up to 4000 valence electrons can be solved. VASP and ABINIT use a self-consistency cycle to calculate the electronic ground-state of the Kohn-Sham equation.

9 The Kohn-Sham (KS) eigenvalue problem (eigenproblem) to solve is H φ n = ǫ n S φ n, n = 1,...,N b, where H is the Kohn-Sham Hamiltonian, φ n is the band eigenstate wave function, ǫ n is the eigenvalue, describing the band energy, and S is the overlap matrix, specifying the orthonormality constraints of eigenstates for different bands. In PW basis, the calculations of H φ n and S φ n are relatively inexpensive. Matrices H and S are Hermitian and S is positive definite. In VASP, S is not identity because of the use of ultra-soft pseudo potentials. N b N wavefunctions of occupied orbitals are computed. In a self-consistent calculation, optimization of the charge density, appearing in the Hamiltonian, and wavefunctions φ n of the KS eigenproblem is done in a cycle. In this talk, we consider only the KS eigenproblem numerical solution part, not the charge density updates.

10 The expectation value of the Hamiltonian for a wavefunction φ app ǫ app = φ app H φ app φ app S φ app is called the Rayleigh quotient. Variation of the Rayleigh quotient with respect to φ app leads to the residual vector defined as R( φ app ) = (H ǫ app S) φ app. To accelerate the convergence, a preconditioner K is applied to the residual: p( φ app ) = K R( φ app ) = K(H ǫ app S) φ app. The preconditioner K is Hermitian, often simply an inverse to the main diagonal of H ǫ app S or of H - the latter is used in our tests for this talk.

11 Another core technique used in VASP is the Rayleigh-Ritz variational scheme in a subspace spanned by N a wavefunctions φ i, i = 1,...,N a : φ j H φ i = H ij, φ j S φ i = S ij, H ij U jk = ǫ k S ij U jk, φ j = U jk φ k, using the standard summation agreement on repeated indexes from 1 to N a. Here, on the first two steps we compute the N a -by-n a Gram matrices of the KS quadratic forms with respect to the given basis of wavefunctions. Then we solve a matrix generalized eigenproblem with the Gram matrices to compute the Ritz values ǫ k. Finally, we calculate the Ritz functions φ j. The set of wavefunctions here is not assumed to be S-orthonormal and may include the approximate KS eigenfunctions as well as preconditioned residuals and other functions.

12 VASP single band preconditioned steepest descent The method is robust and simple. It utilizes the two-term recurrence φ (M+1) Span { p (M), φ (M) }, where p (M) = p( φ (M) }) is the preconditioned residual on the M-th iteration and φ (M+1) is chosen to minimize the Rayleigh quotient ǫ (M+1) by using the Rayleigh Ritz method on this two dimensional subspace. It can only compute approximately the lowest eigenstate φ 1. The asymptotic convergence speed is determined, when the preconditioner K is positive definite, by the ratio κ of the largest and the smallest positive eigenvalues of K(H ǫ 1 S). Two-term recurrence + Rayleigh Ritz method = Singe Band Steepest Descent Method

13 10 2 VASP single band steepest descent 10 2 VASP single band steepest descent Absolute eigenvalue errors 10 1 10 1 Absolute eigenvalue errors 10 4 10 6 10 8 10 3 10 10 100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12 Figure 1: SD κ 4 10 2 (left) and with diagonal preconditioning κ 2 (right), no cluster of eigenvalues in these tests

14 10 2 VASP single band steepest descent 10 2 VASP single band steepest descent Absolute eigenvalue errors 10 1 10 1 Absolute eigenvalue errors 10 1 10 1 10 3 10 4 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 Figure 2: SD κ 10 5 (left) and with diagonal preconditioning κ 6 10 2 (right) for an eigenvalue from a cluster of 10 3 width. The fast convergence in the beginning until the cluster resolution.

15 VASP band by band preconditioned steepest descent After we computed the approximate lowest eigenstate φ 1, e.g., using the VASP single band preconditioned steepest descent method from the previous slide, we can compute the next lowest by taking the S-orthogonal complement to φ 1. We need to orthogonalize the initial guess, and the current preconditioned residual vector p (M) is being replaced with g (M) = (1 φ 1 φ 1 S) p (i). If we use again the single band preconditioned steepest descent in the S-orthogonal complement, we can compute a number of lowest eigenstates band by band. Asymptotic convergence is not cluster robust. No guarantee for cluster calculations! Singe Band Preconditioned Method + Orthogonalization to the Previously Computed Eigenstates = Band by Band Preconditioned Method

16 Single band preconditioned conjugate gradients A well known approach to accelerate convergence of the preconditioned steepest descent is to replace it with the preconditioned conjugate gradients (PCG). The Rayleigh quotient that we want to minimize is a smooth function for nonzero wavefunctions and its gradient and its Hessian are easy to calculate explicitly. Any version of the PCG for smooth non-quadratic functions can be tried for the minimization of the Rayleigh quotient. VASP apparently (the method description is too sketchy) uses a version known in optimization as the Fletcher Reeves formula: f (0) = 0, f (M) = p (M) + sign(γ (M) φ ) p (M) R (M) p (M 1) R (M 1) f(m 1). f (M) replaces p (M) in the two dimensional trial subspace so that φ (M+1) = γ (M) f f (M) + γ (M) φ φ (M) in the Rayleigh Ritz procedure.

17 Band by band preconditioned conjugate gradients After we computed the approximate lowest eigenstate φ 1, e.g., using the single band preconditioned conjugate gradients method from the previous slide, we can compute the next lowest by taking the S-orthogonal complement to φ 1. We need to orthogonalize the initial guess, and the current preconditioned residual vector p (M) is being replaced with g (M) = (1 φ 1 φ 1 S) p (i). Fletcher Reeves formula changes to: f (M) = g (M) + sign(γ (M) and φ (M+1) = γ (M) f φ ) g (M) R (M) g (M 1) R (M 1) f(m 1), f (0) = 0. f (M) + γ (M) φ φ (M) in the Rayleigh Ritz procedure.

18 10 2 VASP single band steepest descent 10 2 VASP single band steepest descent Absolute eigenvalue errors 10 4 Absolute eigenvalue errors 10 4 10 6 10 8 10 6 10 10 100 200 300 400 500 600 700 800 900 1000 10 2 VASP single band CG 5 10 15 20 25 10 2 VASP single band CG Absolute eigenvalue errors 10 4 10 6 10 8 Absolute eigenvalue errors 10 4 10 6 10 8 10 10 10 10 20 40 60 80 100 120 2 4 6 8 10 12 14 Figure 3: SD and CG without (left) and with (right) preconditioning.

19 Locally optimal single band PCG The method combines robustness and simplicity of the preconditioned steepest descent method with a three-term recurrence formula: φ (M+1) Span { p (M), φ (M), φ (M 1) }, where p (M) = p( φ (M) ) is the preconditioned residual on the M-th iteration and φ (M+1) is chosen to minimize the Rayleigh quotient ǫ (M+1) by using the Rayleigh Ritz method on this three dimensional subspace. The asymptotic convergence speed to lowest eigenstate φ 1 depends, when the preconditioner K is positive definite, on the square root of the ratio κ of the largest and the smallest positive eigenvalues of K(H ǫ 1 S). Three-term recurrence + Rayleigh Ritz method = Locally Optimal Conjugate Gradient Method

20 Locally optimal band by band PCG After we computed the approximate lowest eigenstate φ 1, e.g., using the locally optimal single band preconditioned conjugate gradients method from the previous slide, we can compute the next lowest by taking the S-orthogonal complement to φ 1. We need to orthogonalize the initial guess, and the current preconditioned residual vector p (M) is being replaced with g (M) = (1 φ 1 φ 1 S) p (i). No other changes are needed. The band by band implementation of the locally optimal PCG method shares the same advantages and difficulties with other band by band iterative schemes, such as low memory requirements and troubles resolving clusters of eigenvalues.

21 Locally optimal vs. Fletcher Reeves PCG The methods converge quite similarly in practice in many cases, though a theoretical explanation of this fact is still not available. The costs per iterations are also similar, but the locally optimal method is slightly more expensive. There is known (Knyazev, 2001) a reasonably stable implementation of the locally optimal method that uses an implicitly computed linear combination of φ (M), φ (M 1) instead φ (M 1) in the Rayleigh-Ritz procedure. The convergence of the locally optimal method is well supported theoretically, while the convergence of the Fletcher Reeves version is not. Perhaps, the main practical advantage of the locally optimal method is that it allows a straightforward block generalization to calculate several eigenpairs simultaneously.

22 Block eigenvalue iterations A well known idea of using Simultaneous, or Block Iterations provides an important improvement over single-vector methods, and permits us to compute an invariant subspace, rather than one eigenvector at a time, which allows robust calculations of clustered eigenvalues and corresponding invariant subspaces. It can also serve as an acceleration technique over a single-vector methods on parallel computers, as convergence for extreme eigenvalues usually increases with the size of the block, and every step can be naturally implemented on wide varieties of multiprocessor computers as well as to take advantage processors cache through the use of high level BLAS libraries.

23 VASP simple block Davidson (Block preconditioned steepest descent) φ (M+1) i Span { p (M) 1,..., p (M) N b, φ (M) 1,..., φ (M) N b }, i = 1,...,N b, where p (M) i = p( φ (M) ) is the preconditioned residual on the M-th iteration for the ith band and φ (M+1) i are chosen as Ritz functions by using the Rayleigh Ritz method on this 2N b dimensional subspace, to approximate the N b lowest exact KS eigenstates φ i. The asymptotic convergence speed for the ith band is determined, when the preconditioner K is positive definite, by the ratio κ i of the largest and the smallest positive eigenvalues of K(H ǫ i S). Trial subspace of all approx. eigenfunctions and residuals + Rayleigh Ritz method = Block Steepest Descent Method

24 10 2 VASP single band steepest descent 10 2 VASP single band steepest descent Absolute eigenvalue errors 10 4 Absolute eigenvalue errors 10 4 10 6 10 8 10 6 10 10 100 200 300 400 500 600 700 800 900 1000 VASP simple block Davidson 5 10 15 20 25 30 35 40 45 50 VASP simple block Davidson Eigenvalue errors 10 5 10 10 Eigenvalue errors 10 5 10 10 10 15 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30 35 40 Figure 4: SD and block SD without (left) and with (right) preconditioning.

25 VASP single band steepest descent VASP single band steepest descent Absolute eigenvalue errors 10 1 10 1 10 3 10 4 Absolute eigenvalue errors 10 4 10 6 10 5 100 200 300 400 500 600 700 800 900 1000 VASP simple block Davidson 100 200 300 400 500 600 700 800 900 1000 VASP simple block Davidson Eigenvalue errors 10 5 10 10 Eigenvalue errors 10 5 10 10 10 15 100 200 300 400 500 2 4 6 8 10 12 14 16 18 Figure 5: SD and block SD without (left) and with (right) preconditioning.

26 General block Davidson with a trivial restart The block preconditioned steepest descent is the simplest case of a general block Davidson method. In the general block Davidson method, the trial subspace of the Rayleigh Ritz method increases in each step by N b current preconditioned residuals, i.e., in its original formulation the general block Davidson method memory requirements grow with every iteration by N b vectors. Such method converges faster than the block steepest descent, but the costs are growing with the number of iterations. A restart is needed in practice. A trivial restart at every step restricts the trial subspace only to the N b approximate eigenfunctions and one set of N b preconditioned residuals, thus, leading to the block preconditioned steepest descent from the previous slide, utilized in VASP.

27 General block Davidson with a non-standard restart A non-standard restart in the block Davidson method is suggested in C. Murray, S. C. Racine, E. R. Davidson, Improved algorithms for the lowest few eigenvalues and associated eigenvectors of large matrices. J. Comput. Phys. 103 (1992), no. 2, 382 389. A. V. Knyazev, Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23 (2001), no. 2, 517 541 (electronic). In addition to the usual N b current approximate eigenfunctions φ (M) 1,..., φ (M) N b, we also add N b approximate eigenfunctions from the previous step: φ (M 1) 1,..., φ (M 1) N b.

28 Locally optimal block PCG (LOBPCG) Approximate eigenfunctions φ (M+1) i are chosen from the Span { p (M) 1,..., p (M), φ (M) 1,..., φ (M), φ (M 1) 1,..., φ (M 1) N b }, N b where p (M) i = p( φ (M) ) is the preconditioned residual on the M-th iteration for the ith band and φ (M+1) i are chosen as Ritz functions by using the Rayleigh Ritz method on this 3N b dimensional subspace, to approximate the N b lowest exact KS eigenstates φ i. The asymptotic convergence speed for the ith band depends, when the preconditioner K is positive definite, on the square root of the ratio κ i of the largest and the smallest positive eigenvalues of K(H ǫ i S). Current and one set of previous approx. eigenfunctions plus residuals + Rayleigh Ritz method = Block Conjugate Gradient Method N b

29 LOBPCG interpretations: LOBPCG can be interpreted as a specific version of conjugate gradient optimization, different from those used in VASP, that can be easily applied in the block form. It can also be viewed as the block steepest descent scheme, augmented with extra vectors in the basis set, namely with the wavefunctions from the previous iteration step, not with the residuals as in the block Davidson method. Finally, it can be seen as simplified specially restarted block Davidson method. It seems to preserve the fast convergence of the block Davidson at the much lower costs.

30 10 2 VASP single band CG 10 2 VASP single band CG Absolute eigenvalue errors 10 4 10 6 10 8 Absolute eigenvalue errors 10 4 10 6 10 8 10 10 10 10 20 40 60 80 100 120 140 10 20 30 40 50 60 70 LO Block CG LO Block CG Eigenvalue errors 10 5 Eigenvalue errors 10 5 10 10 10 10 10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20 Figure 6: CG and block CG without (left) and with (right) preconditioning.

31 VASP single band CG VASP single band CG Absolute eigenvalue errors 10 4 10 6 10 8 Absolute eigenvalue errors 10 4 10 6 50 100 150 200 LO Block CG 10 8 20 40 60 80 100 120 140 160 180 LO Block CG Eigenvalue errors 10 4 10 6 10 8 Eigenvalue errors 10 4 10 6 10 8 10 10 10 10 10 12 10 20 30 40 50 60 2 4 6 8 10 12 Figure 7: CG and block CG without (left) and with (right) preconditioning.

32 Implementing LOBPCG in ABINIT ABINIT is a FORTRAN 90 parallel code with timing. ABINIT rev. 4.5 manual: The algorithm LOBPCG is available as an alternative to band-by-band conjugate gradient, from Gilles Zérah. Use wfoptalg=4. See Test v4 No. 93 and 94. The band-by-band parallelization of this algorithm should be much better than the one of the usual algorithm. The implementation follows my MATLAB code of LOBPCG rev. 4.10 with some simplifications.

33 Testing LOBPCG in ABINIT Test example results provided by G. Zérah: 32 boron atoms and 88 bands, nonselfconsistent. Band-by-band PCG every band converges in 4-6 iterations. LOBPCG block size 88 converges in 2-4 iterations, approximately 2 times faster. LOBPCG block size 20 with partial convergence converges (we do not wait for the previous block to converge to compute the next, by limiting the number of LOBPCG iteration to nline=4) is surprisingly even faster. We can even limit nline=2 and get the same overall complexity as for nline=4. Mixed results for LOBPCG II applied to the selfconsistent calculation of Hydrogen and nonselfconsistent 32 boron atoms: slow convergence for some bands.

34 LOBPCG in PETSCAN Work by others on LOBPCG in material sciences: Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures, Stanimire Tomov, Julien Langou, Andrew Canning, Lin-Wang Wang, and Jack Dongarra implement and test LOBPCG in PESCAN, where a semi-empirical potential or a charge patching method is used to construct the potential and only the eigenstates of interest around a given energy are calculated using folded spectrum: if memory is not a problem and block version of the matrix-vector multiplication can be efficiently implemented, the FS-LOBPCG will be the method of choice for the type of problems discussed.

35 Conclusions LOBPCG is a valuable alternative to eigensolvers currently used in electronic structure calculations Initial numerical results look promising, but more testing is needed on larger problems Future work ABINIT LOBPCG testing vs. the default CG solver New LOBPCG modifications with reduced orthogonalization costs Revisiting traditional preconditioning Collaboration with John Pask of LLNL on implementing LOBPCG in the in-house FEM electronic structure calculations code