Domain Decomposition-based contour integration eigenvalue solvers

Similar documents
Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

SPECTRAL SCHUR COMPLEMENT TECHNIQUES FOR SYMMETRIC EIGENVALUE PROBLEMS

FEAST eigenvalue algorithm and solver: review and perspectives

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

The EVSL package for symmetric eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

BEYOND AMLS: DOMAIN DECOMPOSITION WITH RATIONAL FILTERING

Is there life after the Lanczos method? What is LOBPCG?

BEYOND AUTOMATED MULTILEVEL SUBSTRUCTURING: DOMAIN DECOMPOSITION WITH RATIONAL FILTERING

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

ILAS 2017 July 25, 2017

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Parallel Multilevel Low-Rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

1. Introduction. We consider the problem of solving the linear system. Ax = b, (1.1)

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

COMPUTING PARTIAL SPECTRA WITH LEAST-SQUARES RATIONAL FILTERS

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

PASC17 - Lugano June 27, 2017

A knowledge-based approach to high-performance computing in ab initio simulations.

A METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE AS METHOD. Kenta Senzaki, Hiroto Tadano, Tetsuya Sakurai and Zhaojun Bai

Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

LEAST-SQUARES RATIONAL FILTERS FOR THE SOLUTION OF INTERIOR EIGENVALUE PROBLEMS

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

Incomplete Cholesky preconditioners that exploit the low-rank property

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Schur complement-based domain decomposition preconditioners with low-rank corrections

Contents. Preface... xi. Introduction...

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

Recent advances in HPC with FreeFem++

Parallelization of the Molecular Orbital Program MOS-F

Domain Decomposition Preconditioners for Spectral Nédélec Elements in Two and Three Dimensions

9.1 Preconditioned Krylov Subspace Methods

arxiv: v1 [hep-lat] 19 Jul 2009

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Multigrid absolute value preconditioning

A Numerical Study of Some Parallel Algebraic Preconditioners

Preconditioned Eigensolver LOBPCG in hypre and PETSc

Contour Integral Method for the Simulation of Accelerator Cavities

Lecture 8: Fast Linear Solvers (Part 7)

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Arnoldi Methods in SLEPc

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Open-source finite element solver for domain decomposition problems

DELFT UNIVERSITY OF TECHNOLOGY

Lecture 17: Iterative Methods and Sparse Linear Algebra

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Preface to the Second Edition. Preface to the First Edition

A Jacobi Davidson Method for Nonlinear Eigenproblems

Preconditioned inverse iteration and shift-invert Arnoldi method

THE EIGENVALUES SLICING LIBRARY (EVSL): ALGORITHMS, IMPLEMENTATION, AND SOFTWARE

APPLIED NUMERICAL LINEAR ALGEBRA

A Parallel Implementation of the Trace Minimization Eigensolver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Sakurai-Sugiura algorithm based eigenvalue solver for Siesta. Georg Huhs

WRF performance tuning for the Intel Woodcrest Processor

Implementation of a preconditioned eigensolver using Hypre

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Schwarz-type methods and their application in geomechanics

A FEAST Algorithm with oblique projection for generalized eigenvalue problems

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm

Solving Ax = b, an overview. Program

Stabilization and Acceleration of Algebraic Multigrid Method

LARGE SPARSE EIGENVALUE PROBLEMS

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

Migration with Implicit Solvers for the Time-harmonic Helmholtz

Error Bounds for Iterative Refinement in Three Precisions

Variants of BiCGSafe method using shadow three-term recurrence

Lecture 3: Inexact inverse iteration with preconditioning

Computing least squares condition numbers on hybrid multicore/gpu systems

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Parallel sparse linear solvers and applications in CFD

The Nullspace free eigenvalue problem and the inexact Shift and invert Lanczos method. V. Simoncini. Dipartimento di Matematica, Università di Bologna

Matrices, Moments and Quadrature, cont d

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Multipréconditionnement adaptatif pour les méthodes de décomposition de domaine. Nicole Spillane (CNRS, CMAP, École Polytechnique)

Spectrum Slicing for Sparse Hermitian Definite Matrices Based on Zolotarev s Functions

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

The Deflation Accelerated Schwarz Method for CFD

Inexact inverse iteration with preconditioning

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Direct Self-Consistent Field Computations on GPU Clusters

Matrix balancing and robust Monte Carlo algorithm for evaluating dominant eigenpair

A parallel exponential integrator for large-scale discretizations of advection-diffusion models

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Polynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Iterative Methods for Solving A x = b

A Parallel Additive Schwarz Preconditioned Jacobi-Davidson Algorithm for Polynomial Eigenvalue Problems in Quantum Dot Simulation

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

Transcription:

Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM ALA 2015, 10-30-2015 VK, YS (U of M) DD-CINT techniques 10-30-2015 1 / 31

Acknowledgments Joint work with Eric Polizzi and James Kestyn (UM Amherst). Special thanks to Minnesota Supercomputing Institute for allowing access to its supercomputers. Work supported by NSF and DOE (DE-SC0008877). VK, YS (U of M) DD-CINT techniques 10-30-2015 2 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 3 / 31

Introduction The sparse symmetric eigenvalue problem Ax = λx where A R n n, x R n and λ R. A pair (λ,x) is called an eigenpair of A. Focus in this talk Find all (λ,x) pairs inside the interval [α,β] where α, β R and λ 1 α, β λ n. Typical approach Project A on a low-dimensional subspace by V AVy = θv Vy, x = Vy. V: Krylov, (Generalized-Jacobi)-Davidson, contour integration,... VK, YS (U of M) DD-CINT techniques 10-30-2015 4 / 31

Contour integration (CINT) V := PˆV = 1 (ζi A) 1 dζ ˆV XX ˆV, 2iπ with Γ complex contour with endpoints [α,β]. V is an exact invariant subspace Numerical approximation PˆV P ˆV n c = j=1 Γ ω j (ζ j I A) 1ˆV, n c ρ(z) = with (weight,pole) (ω j,ζ j ), j = 1,...,n c. Trapezoidal, Midpoint, Gauss-Legendre,... Zolotarev, Least-Squares,... j=1 FEAST (Polizzi), Sakurai-Sugiura (SS), SS-CIRR,... ω j ζ j z VK, YS (U of M) DD-CINT techniques 10-30-2015 5 / 31

Main characteristics of CINT Can be seen as a (rational) filtering technique Different levels of parallelism Eigenvalue problem Linear systems with multiple right-hand sides In this talk We study contour integration from a Domain Decomposition (DD) point-of-view Two ideas: Use DD to derive CINT schemes Use DD to accelerate FEAST or other CINT-based method We target parallel architectures VK, YS (U of M) DD-CINT techniques 10-30-2015 6 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 7 / 31

Partitioning of the domain (Metis, Scotch,...) Figure: An edge-separator (vertex-based partitioning) VK, YS (U of M) DD-CINT techniques 10-30-2015 8 / 31

The local viewpoint assume M partitions Stack interior variables u 1,u 2,...,u P into u, then interface variables y, B 1 E 1 u 1 u 1 B 2 E 2 u 2 u 2..... = λ. B M E M u M u M E1 E2 EM C y y VK, YS (U of M) DD-CINT techniques 10-30-2015 9 / 31

Pictorially: Write as ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques 10-30-2015 10 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 11 / 31

Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C VK, YS (U of M) DD-CINT techniques 10-30-2015 12 / 31

Expressing (A ζi) 1 in DD Let ζ C and recall that ( ) B E A = E. C Then ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, where F(ζ) = (B ζi) 1 E S(ζ) = C ζi E T (B ζi) 1 E. VK, YS (U of M) DD-CINT techniques 10-30-2015 12 / 31

Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G VK, YS (U of M) DD-CINT techniques 10-30-2015 13 / 31

Spectral projectors and DD As previously, ( (B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) F(ζ)S(ζ) 1 ) (A ζi) 1 = S(ζ) 1 F(ζ) S(ζ) 1, Then, P DD = 1 (A ζi) 1 dζ 2iπ Γ ( ) H W W G H = 1 2iπ Γ [(B ζi) 1 +F(ζ)S(ζ) 1 F(ζ) ]dζ G = 1 2iπ Γ S(ζ) 1 dζ W = 1 2iπ Γ F(ζ)S(ζ) 1 dζ. VK, YS (U of M) DD-CINT techniques 10-30-2015 13 / 31

Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. VK, YS (U of M) DD-CINT techniques 10-30-2015 14 / 31

Extracting approximate eigenspaces Let ˆV be a set of mrhs to multiply P ) ( ) ( ) (ˆVu HˆVu WˆV s Zu P DD = ˆV s W ˆVu,with +GˆV s Z s Z u = 1 2iπ Γ (B ζi) 1ˆVu dζ 1 2iπ Γ F(ζ)S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ Z s = 1 2iπ Γ S(ζ) 1 [ˆV s F(ζ) ˆVu ]dζ. In practice: n c n c Z u = ω j (B ζ j I) 1ˆVu ω j F(ζ j )S(ζ) 1 [ˆV s F(ζ) ˆVu ], j=1 j=1 n c Z s = ω j S(ζ j ) 1 [ˆV s F(ζ j ) ˆVu ]. j=1 VK, YS (U of M) DD-CINT techniques 10-30-2015 14 / 31

Pseudocode - Full projector (DD-FP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1 W s ; Z s := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : Two solves with B ζ j I + One solve with S(ζ j ) The procedure can be repeated with an updated ˆV Equivalent to FEAST tied with a DD solver VK, YS (U of M) DD-CINT techniques 10-30-2015 15 / 31

An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ VK, YS (U of M) DD-CINT techniques 10-30-2015 16 / 31

An alternative scheme CINT along the interface unknowns P DD = 1 ( ) W (A ζi) 1 dζ = [P 1,P 2 ], 2iπ Γ G G = 1 S(ζ) 1 dζ, W = 1 (B ζi) 1 ES(ζ) 1 dζ. 2iπ 2iπ Γ Advantage: Does not involve the inverse of whole matrix. Γ ( ) U P DD = XX, X = Y ( ) UY P DD = YY Just capture the range of P 2 = XY P 2 randn() Also: Lanczos on P 2 P 2 (sequential, doubles the work) VK, YS (U of M) DD-CINT techniques 10-30-2015 16 / 31

Pseudocode - Partial projector (DD-PP) 1: for j = 1 to n c do 2: W u := (B ζ j I) 1ˆVu (local) 3: W s := ˆV s E W u (local) 4: W s := S(ζ j ) 1ˆVs ; Zs := Z s +ω j W s (distributed) 5: W u := W u (B ζ j ) 1 EW s ; Zu := Z u +ω j W u (local) 6: end for Practical considerations For each ζ j, j = 1,...,n c : One solve with B ζ j I + One solve with S(ζ j ) More like a one-shot method VK, YS (U of M) DD-CINT techniques 10-30-2015 17 / 31

A more detailed analysis of DD-PP Spectral Schur complement λ det[s(λ)] = 0 (λ / σ(b)) The eigenvector satisfies ( ) (B λi) 1 Ey x =, with y := S(λ)y = 0. y If (B ζi) 1 analytic in [α,β] W = 1 (B ζi) 1 ES(ζ) 1 dζ {(B λi) 1 Ey} Γ 2iπ Γ G = 1 S(ζ) 1 dζ { y} Γ 2iπ Γ Another idea: Solve S(λ)y = 0 directly ([VK,RLi,YS],[VanB,Kra],[Sak]) VK, YS (U of M) DD-CINT techniques 10-30-2015 18 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 19 / 31

A closer look at the Schur complement So far: Eigenvalue problem Linear systems with mrhs Schur complement From the DD framework we have S 1 (ζ) E 12... E 1M E 21 S 2 (ζ)... E 2M S(ζ) =....., where E M1 E M2... S M (ζ) S i (ζ) = C i ζi E T i (B i ζi) 1 E i, i = 1,...,M, is the local Schur complement (complex symmetric). VK, YS (U of M) DD-CINT techniques 10-30-2015 20 / 31

Solving linear systems with the Schur complement Straightforward approach Form and factorize S(ζ) Extremely robust but impractical for 3D problems Alternative Use an approximation of S(ζ) Lots of ideas (parms, LORASC,...) Typical preconditioners implemented: Block Jacobi: Use C i, C i ζi or S i (ζ), i = 1,...,M Global approximation: Use C, C ζi or S(ζ) Memory Vs robustness Important: magnitude of the imaginary part of a pole VK, YS (U of M) DD-CINT techniques 10-30-2015 21 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 22 / 31

Implementation and computing environment Hardware ITASCA HP Linux cluster at Minnesota Supercomputing Inst. 1,091 HP ProLiant BL280c G6 blade servers, each with two-socket, quad-core 2.8 GHz Intel Xeon X5560 Nehalem EP (24 GB per node) Software 40-gigabit QDR InfiniBand (IB) interconnect The software was written in C++ and on top of PETSc (MPI) Linked to AMD, METIS, UMFPACK, MUMPS, MKL-BLAS, MKL-LAPACK Compiled with mpiicpc (-O3) VK, YS (U of M) DD-CINT techniques 10-30-2015 23 / 31

Parallel implementation (0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3) COMM DD COMM XX Different levels of parallelism (1-D, 2-D, 3-D grids) COMM DD is the communicator used for Domain Decomposition Different choices for XX XX = mrhs: Parallelize the multiple right-hand sides XX = poles: Parallelize the number of poles Selection depends on the linear solver (direct or iterative) VK, YS (U of M) DD-CINT techniques 10-30-2015 24 / 31

Experimental framework CINT + Subspace Iteration CINT-SI: standard FEAST approach Direct (MUMPS) or iterative (preconditioned) solver CINT + DD DD-FP: implements the full projector DD-PP: implements the partial projector Schur complement: exact or approximate Details # MPI processes # cores Quadrature rule: Gauss-Legendre Eig/vle tolerance: 1e 8 VK, YS (U of M) DD-CINT techniques 10-30-2015 25 / 31

Numerical illustration A comparison of DD-FP and DD-PP I 10 2 2D Laplacean 50 50 [α,β]=[1.6,1.7] Average relative residual 10 3 10 4 DD FP, n = 4 c DD PP: n = 4 c DD PP: n = 8 c DD PP: n = 12 c 10 5 1 1.5 2 2.5 3 3.5 4 Iterations VK, YS (U of M) DD-CINT techniques 10-30-2015 26 / 31

Numerical illustration (cont. from previous) A comparison of DD-FP and DD-PP II VK, YS (U of M) DD-CINT techniques 10-30-2015 27 / 31

Test on a 2D 1001 1000 Laplacian Table: Time is listed in seconds. A 2-D grid of processors was used. S(ζ) factorized explicitly. Number of poles: n c = 4 Exterior eigvls Interior eigvls MPI 16 4 MPI 32 4 MPI 64 4 [α, β] Its CINT-SI DD-FP CINT-SI DD-FP CINT-SI DD-FP [λ 1,λ 20 ] 3 88 37 53 26 42 20 [λ 1,λ 50 ] 3 159 65 88 38 65 30 [λ 1,λ 100 ] 5 432 172 241 98 136 65 [λ 201,λ 220 ] 3 89 37 53 26 42 20 [λ 201,λ 250 ] 4 286 113 164 67 110 48 [λ 201,λ 300 ] 4 440 214 245 122 141 84 Exterior: Number of right-hand sides number of eigvls + 20 Interior: Number of right-hand sides 2 number of eigvls VK, YS (U of M) DD-CINT techniques 10-30-2015 28 / 31

Efficiency of DD-FP Ratio 1 0.9 0.8 0.7 Efficiency of DD FP (from previous table) [λ 1,λ 50 ] [λ 201,λ 250 ] [λ 1,λ 100 ] [λ,λ ] 201 300 0.6 0.5 50 100 150 200 250 300 MPI Processes VK, YS (U of M) DD-CINT techniques 10-30-2015 29 / 31

Contents 1 Introduction 2 The Domain Decomposition framework 3 Domain Decomposition-based contour integration 4 Implementation in HPC architectures 5 Experiments 6 Discussion VK, YS (U of M) DD-CINT techniques 10-30-2015 30 / 31

Conclusion In this talk We presented a DD-based form of contour integration DD can be used to: Solve the linear systems in numerical integration Derive DD-based contour integration methods The Schur complement holds a central role Considerations Higher moments of the Schur complement integral Block Krylov subspaces Preconditioning of indefinite linear systems VK, YS (U of M) DD-CINT techniques 10-30-2015 31 / 31