Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Similar documents
Is there life after the Lanczos method? What is LOBPCG?

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Preconditioned Eigensolver LOBPCG in hypre and PETSc

arxiv: v1 [cs.ms] 18 May 2007

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

MARCH 24-27, 2014 SAN JOSE, CA

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method

Implementation of a preconditioned eigensolver using Hypre

APPLIED NUMERICAL LINEAR ALGEBRA

Arnoldi Methods in SLEPc

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

Multilevel Methods for Eigenspace Computations in Structural Dynamics

State-of-the-art numerical solution of large Hermitian eigenvalue problems. Andreas Stathopoulos

Numerical Methods in Matrix Computations

Multigrid absolute value preconditioning

Using the Karush-Kuhn-Tucker Conditions to Analyze the Convergence Rate of Preconditioned Eigenvalue Solvers

Max Planck Institute Magdeburg Preprints

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Iterative methods for symmetric eigenvalue problems

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

DELFT UNIVERSITY OF TECHNOLOGY

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

Conjugate gradient acceleration of non-linear smoothing filters

A knowledge-based approach to high-performance computing in ab initio simulations.

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience

Accelerating linear algebra computations with hybrid GPU-multicore systems.

FEAST eigenvalue algorithm and solver: review and perspectives

DELFT UNIVERSITY OF TECHNOLOGY

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

Preconditioned inverse iteration and shift-invert Arnoldi method

PRECONDITIONED ITERATIVE METHODS FOR LINEAR SYSTEMS, EIGENVALUE AND SINGULAR VALUE PROBLEMS. Eugene Vecharynski. M.S., Belarus State University, 2006

SOLVING MESH EIGENPROBLEMS WITH MULTIGRID EFFICIENCY

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Preface to the Second Edition. Preface to the First Edition

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

ESLW_Drivers July 2017

FEM and sparse linear system solving

A CHEBYSHEV-DAVIDSON ALGORITHM FOR LARGE SYMMETRIC EIGENPROBLEMS

Contents. Preface... xi. Introduction...

GPU accelerated Arnoldi solver for small batched matrix

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

Activity Mining in Sensor Networks

Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures. Stanimire Tomov*

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

Linear Solvers. Andrew Hazel

Domain Decomposition-based contour integration eigenvalue solvers

A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

c 2009 Society for Industrial and Applied Mathematics

Computing least squares condition numbers on hybrid multicore/gpu systems

6.4 Krylov Subspaces and Conjugate Gradients

Low Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Iterative Methods for Solving A x = b

DELFT UNIVERSITY OF TECHNOLOGY

A hybrid Hermitian general eigenvalue solver

ETNA Kent State University

Simulating the generation, evolution and fate of electronic excitations in molecular and nanoscale materials with first principles methods.

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

Conjugate gradient acceleration of non-linear smoothing filters Iterated edge-preserving smoothing

Domain decomposition on different levels of the Jacobi-Davidson method

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm

Alternative correction equations in the Jacobi-Davidson method

Degeneracy in Maximal Clique Decomposition for Semidefinite Programs

A Parallel Implementation of the Trace Minimization Eigensolver

STEEPEST DESCENT AND CONJUGATE GRADIENT METHODS WITH VARIABLE PRECONDITIONING

Algebraic Multigrid as Solvers and as Preconditioner

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

LARGE SPARSE EIGENVALUE PROBLEMS

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

A Comparison Between Polynomial and Locally Weighted Regression for Fault Detection and Diagnosis of HVAC Equipment

GPU ACCELERATED POLYNOMIAL SPECTRAL TRANSFORMATION METHODS

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

On the Superlinear Convergence of MINRES. Valeria Simoncini and Daniel B. Szyld. Report January 2012

College of William & Mary Department of Computer Science

Computational Methods. Eigenvalues and Singular Values

Deconvolution of Overlapping HPLC Aromatic Hydrocarbons Peaks Using Independent Component Analysis ICA

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

Solving Ax = b, an overview. Program

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Space-Variant Computer Vision: A Graph Theoretic Approach

An Asynchronous Algorithm on NetSolve Global Computing System

Numerical Methods I Non-Square and Sparse Linear Systems

Index. for generalized eigenvalue problem, butterfly form, 211

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

Transcription:

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev, A. TR2017-078 June 2017 Abstract Since introduction [A. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SISC (2001) doi:10.1137/s1064827500366124] and efficient parallel implementation [A. Knyazev et al., Block locally optimal preconditioned eigenvalue xolvers (BLOPEX) in HYPRE and PETSc, SISC (2007) doi:10.1137/060661624], LOBPCG has been used is a wide range of applications in mechanics, material sciences, and data sciences. We review its recent implementations and applications, as well as extensions of the local optimality idea beyond standard eigenvalue problems. Householder Symposium on Numerical Linear Algebra This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2017 201 Broadway, Cambridge, Massachusetts 02139

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method (LOBPCG) Andrew Knyazev (Mitsubishi Electric Research Laboratories) Abstract Since introduction [A. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SISC (2001) doi:10.1137/s1064827500366124] and efficient parallel implementation [A. Knyazev et al., Block locally optimal preconditioned eigenvalue xolvers (BLOPEX) in HYPRE and PETSc, SISC (2007) doi:10.1137/060661624], LOBPCG has been used is a wide range of applications in mechanics, material sciences, and data sciences. We review its recent implementations and applications, as well as extensions of the local optimality idea beyond standard eigenvalue problems. 1 Background Kantorovich in 1948 has proposed calculating the smallest eigenvalue λ 1 of a symmetric matrix A by steepest descent using a direction r = Ax λ(x) of a scaled gradient of a Rayleigh quotient λ(x) = (x, Ax)/(x, x) in a scalar product (x, y) = x y, where the step size is computed by minimizing the Rayleigh quotient in the span of the vectors x and w, i.e. in a locally optimal manner. Samokish in 1958 has proposed applying a preconditioner T to the vector r to generate the preconditioned direction w = T r and derived asymptotic, as x approaches the eigenvector, convergence rate bounds. Block locally optimal multi-step steepest descent is described in Cullum and Willoughby in 1985. Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appear in AK PhD thesis; see 1986. The preconditioned version is analyzed in 1991 and 1998, and a practically stable implementation [1]. LOBPCG general technology can also be viewed as a particular case of generalized block Davidson diagonalization methods with thick restart, or accelerated block gradient descent with plane-search. The main points on LOBPCG are as follows: The costs per iteration and the memory use in LOBPCG are competitive with those of the Lanczos method, computing a single extreme eigenpair. Linear convergence is theoretically guaranteed and practically observed, since local optimality implies that LOBPCG converges at least as fast as the gradient descent. In numerical tests, LOBPCG typically shows no super-linear convergence. LOBPCG blocking allows utilizing highly efficient matrix-matrix operations, e.g., BLAS 3. LOBPCG can directly take advantage of preconditioning, in contrast to the Lanczos method. LOBPCG allows variable and non-symmetric or positive definite preconditioning. LOBPCG allows warm starts and computes an approximation to the eigenvector on every iteration. It has no numerical stability issues similar to those of the Lanczos method. LOBPCG is reasonably easy to implement, so many implementations have appeared. Very large block sizes in LOBPCG become expensive to deal with due to orthogonalizations and the use of the Rayleigh-Ritz method on every iteration. 1

2 General purpose open source public software implementations MATLAB/Octave version by AK of the reference algorithm from 2001, publicly available from MATLAB/Octave repositories. BLOPEX, developed by AK and his team [5]. An abstract implementation of the reference algorithm from 2001. Originally at Google Code, currently hosted at bitbucket.org by Jose E. Roman. BLOPEX included into SLEPc/PETSc by Jose E. Roman. BLOPEX incorporated into hypre by AK and his team with hypre developers since 2007. LOBPCG with orthogonalizations, Anasazi/Trilinos by U. Hetmaniuk and R. Lehoucq since 2006. LOBPCG in Java at Google Code Archive by M. E. Argentati. LOBPCG in Python: SciPy, scikit-learn and megaman for manifold learning. LOBPCG GPU implementation in NVIDIA AmgX and spectral graph partitioning. LOBPCG heterogeneous CPU-GPU implementation by H. Anzt, S. Tomov, and J. Dongarra using blocked sparse matrix vector products in MAGMA (2015). 3 LOBPCG in applications As a generic eigenvalue solver for large-scale symmetric eigenproblems, LOBPCG is being used in a wide range of applications. Our own small contribution is to data clustering and image segmentation (2003) [4] and (2015) [2], as well as for low-pass graph-based signal filtering (2015) [3]. Below we list some application-oriented open source software implementations of LOBPCG in various domains. Material Sciences: ABINIT (including CUDA version) by AK with ABINIT developers, since 2008. Implements density functional theory, using a plane wave basis set and pseudopotentials. Octopus TDDFT, employs pseudopotentials and real-space numerical grids. Pescan, nonselfconsistent calculations of electron/hole states in a plane wave basis set. Gordon Bell Prize finalist at ACM/IEEE Conferences on Supercomputing in 2005, [7]. Gordon Bell Prize finalist at ACM/IEEE Conferences on Supercomputing in 2006, [6]. KSSOLV A MATLAB Toolbox for Solving the Kohn Sham Equations, Yang (2009). TTPY, spectra of molecules using tensor train decomposition, Rakhuba, Oseledets (2016). PlatypusQM, PLATform for dynamic protein unified simulation, Takano et al. (2016). MFDn, Nuclear Configuration Interaction, Shao et al. (2016) 2

Data mining: sklearn: Machine Learning in Python, spectral clustering. Megaman: Scalable Manifold Learning in Python, McQueen et al. (2016). Multi-physics: NGSolve (Joachim Schoberl), a general purpose adaptive finite element library with multigrid. Electromagnetic calculations: NGSolve by Joachim Schoberl Maxwell solvers with Python interface. MFEM by Tzanio Kolev, highly scalable parallel high-order adaptive finite elements interfaced to hypre LOBPCG and multigrid preconditioning. Python based PYFEMax by R. Geus, P. Arbenz, L. Stingelin. Nedelec finite element discretisation of Maxwells equations. 4 Methods, motivated by LOBPCG Suppose that the number n b of vectors in the block must be small, e.g., due to memory limitations. Let us consider the extreme case n b = 1 of single-vector iterations only. At the same time, assume that we need to compute n v > n b eigenpairs. It is well-known that one can compute n b eigenpairs at the time, lock them together with all the previously computed eigenvectors, and sequentially compute the next n b eigenpairs in the orthogonal complement to all already locked approximate eigenvectors. E.g., if n b = 1, the eigenpairs are computed one-by-one sequentially. In contrast, LOBPCG II, see [1] utilities n v parallel execution of single vector (n b = 1) 3-term locally optimal recurrences combined with the Rayleigh-Ritz procedure on the n v -dimensional subspace spanned by the current approximations to all n v eigenvectors. E. Vecharynski, C. Yang, and J. Pask (2015) extend the LOBPCG II approach to n v /n b parallel execution of n b > 1 block-vector 3-term recurrences with additional orthogonalizations, and demonstrate that Rayleigh-Ritz can be omitted on many iterations, in Quantum Espresso plane-wave DFT electronic structure software. There are various other recent developments, related to LOBPCG: Indefinite variant of LOBPCG for definite matrix pencils, D. Kressner (2013) is based on an indefinite inner product, and generalizes LOBP4DCG method by Bai and Li (2012 and 2013) for solving product eigenvalue problems Folded spectrum to LOBPCG (FS-LOBPCG), V.S. Sunderam (2005) adds three more block vectors that store the matrix-vector products of the blocks X, R, and P. Szyld and Xue (2016) extend LOBPCG to nonlinear Hermitian eigenproblems with variational characterizations. Low-rank tensor related variants of LOBPCG: Hierarchical Tucker decomposition in low-rank LOBPCG, D. Kressner at al., 2011, 2016. Tensor-Train (TT) manifold-preconditioned LOBPCG for many-body Schrödinger equation, Oseledets at al., 2014, 2016 3

5 Conclusions Still nobody can pronounce it, but LOBPCG: has now a life on its own, with 7K hits in google search, accepted as one of standard methods in DFT electronic structure calculations, appears in most major HPC open source libraries: PETSc, hypre and Anasazi/Trilinos, e.g., pre-packaged in OpenHPC cluster management software implemented for GPUs for ABINIT and in MAGMA and AmgX More efforts are needed to decrease the use of orthogonalizations and Rayleigh-Ritz for large block sizes, while still preserving fast convergence, maintaining reasonable stability, and keeping the algorithm simple to implement. References [1] A. Knyazev. Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Scientific Computing, 23(2):517 541, 2001. doi:10.1561/0600000020. [2] A. Kniazev. Method for Kernel Correlation-Based Spectral Data Processing. US patent application 20150363361 http://www.freepatentsonline.com/y2015/0363361.html. [3] A. Knyazev and A. Malyshev. Accelerated graph-based spectral polynomial filters. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, page 16. IEEE Xplore, 2915. doi:10.1109/mlsp.2015.7324315. [4] A. V. Knyazev. Modern preconditioned eigensolvers for spectral image segmentation and graph bisection. In Boley, Dhillon, Ghosh, and Kogan, editors, Proceedings of the workshop Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003), pages 59 62, Melbourne, Florida, 2003. IEEE Computer Society. http://math.ucdenver.edu/ ~aknyazev/research/conf/icdm03.pdf. [5] A. V. Knyazev, M. E. Argentati, I. Lashuk, and E. E. Ovtchinnikov. Block locally optimal preconditioned eigenvalue xolvers (blopex) in hypre and petsc. SIAM J. Scientific Computing, 29(5):2224 2239, 2007. doi:10.1137/060661624. [6] S. Yamada, T. Imamura, T. Kano, and M. Machida. High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 06, New York, NY, USA, 2006. ACM. doi:10.1145/1188455.1188504. [7] S. Yamada, T. Imamura, and M. Machida. 16.447 tflops and 159-billion-dimensional exactdiagonalization for trapped fermion-hubbard model on the earth simulator. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pages 44 44, Nov 2005. doi:10.1109/sc.2005.1. 4