Large-scale eigenvalue computations with SLEPc

Size: px

Start display at page:

Download "Large-scale eigenvalue computations with SLEPc"

Jasmine Martin
5 years ago
Views:

1 Large-scale eigenvalue computations with SLEPc Jose E. Roman Universitat Politècnica de València, Spain Joint work with: A. Tomás, E. Romero, E. Ramos RES Scientific Seminar on Parallel Simulations on the Network 30 November, 2010 Grid y Computación de Altas Prestaciones

2 Outline 1 Eigenvalue Problems 2 Numerical Libraries The SLEPc Project Basic Usage Parallel Performance 3 4

3 Eigenvalue Problems

4 Eigenproblems Large-scale eigenvalue problems are among the most demanding calculations in scientific computing Example application areas: Dynamic structural analysis (e.g., civil engineering) Stability analysis (e.g., control engineering) Eigenfunction determination (e.g., electromagnetics) Bifurcation analysis (e.g., fluid dynamics) Information retrieval (e.g., latent semantic indexing)

5 Example 1: Structural Analysis Smallest eigenfrequencies of a real symmetric problem Kx = λm x

6 Example 2: FE in Schrödinger Equation λ = 1 λ = 2 λ = 3 Eigenstates of simple harmonic oscillator, (A ε n B) ψ n = 0

7 Example 3: Neutron Difusion Modal analysis of nuclear reactor cores Objectives: Improve safety Reduce operation costs Lambda Modes Equation Lφ = 1 λ Mφ Target: modes associated to largest λ Criticality (eigenvalues) Prediction of instabilities and transient analysis (eigenvectors) e

8 Example 3: Neutron Difusion - Algebraic Eigenproblem Discretized eigenproblem (two energy groups): [ L11 0 L 21 L 22 ] [ ψ1 ψ 2 ] = 1 λ [ M11 M ] [ ] ψ1 ψ 2

9 Example 3: Neutron Difusion - Algebraic Eigenproblem Discretized eigenproblem (two energy groups): [ L11 0 L 21 L 22 ] [ ψ1 ψ 2 ] = 1 λ [ M11 M ] [ ψ1 ψ 2 ] Can be formulated as Gψ 1 = λψ 1, G = L 1 11 (M 11 + M 12 L 1 22 L 21) Matrix should not be computed explicitly In some applications, many successive problems are solved Other modes may be of interest [Verdú, Ginestar, R., Vidal, 2010]

10 Example 4: Computational Electromagnetics Analysis of resonant cavities Source-free wave equations (ˆµ 1 r E) κ 2 0ˆε re =0 (ˆε 1 r H) κ 2 0ˆµ rh =0 FEM discretization leads to Ax = κ 2 0 Bx Target: smallest nonzero eigenfrequencies (large nullspace) λ 1, λ 2,..., λ }{{ k, λ } k+1, λ k+2,..., λ }{{} n =0 Target Eigenfunctions associated to 0 are irrotational electric fields, E = Φ. This allows the computation of a basis of N (A) Constrained Eigenvalue Problem Ax = κ 2 0 Bx } C T Bx = 0

11 Example 5: Damped Track System Underground tracks laid on concrete To prevent vibrations, sleepers are laid on elastic, vibration-damping matting Quadratic eigenvalue problem (QEP): (λ 2 M + λc + K)x = 0

12 Many Flavours of Eigenvalue Problems Various problem characteristics Real/complex, Hermitian/non-Hermitian Need to support complex in real arithmetic Many formulations Standard, generalized, quadratic, non-linear, SVD Special cases: implicit matrix, block-structured problems, constrained problems, structured spectrum Wanted solutions Usually only a few eigenpairs, but may be many Any part of the spectrum (exterior, interior), intervals Robustness and usability Singular B and other special cases, high multiplicity Interoperability (linear solvers, FE), flexibility

13 Eigenvalue Problems Consider the following eigenvalue problems Standard Eigenproblem Ax = λx Generalized Eigenproblem Ax = λbx where λ is a (complex) scalar: eigenvalue x is a (complex) vector: eigenvector Matrices A and B can be real or complex Matrices A and B can be symmetric (Hermitian) or not Typically, B is symmetric positive (semi-) definite

14 Solution of the Eigenvalue Problem There are n eigenvalues (counted with their multiplicities) Partial eigensolution: nev solutions λ 0, λ 1,..., λ nev 1 C x 0, x 1,..., x nev 1 C n nev = number of eigenvalues / eigenvectors (eigenpairs)

15 Solution of the Eigenvalue Problem There are n eigenvalues (counted with their multiplicities) Partial eigensolution: nev solutions λ 0, λ 1,..., λ nev 1 C x 0, x 1,..., x nev 1 C n nev = number of eigenvalues / eigenvectors (eigenpairs) Different requirements: Compute a few of the dominant eigenvalues (largest magnitude) Compute a few λ i s with smallest or largest real parts Compute all λ i s in a certain region of the complex plane

16 Spectral Transformation A general technique that can be used in many methods Ax = λx = T x = θx

17 Spectral Transformation A general technique that can be used in many methods Ax = λx = T x = θx In the transformed problem Eigenvalues are remapped, eigenvectors are not altered Convergence is usually improved (better separation)

18 Spectral Transformation A general technique that can be used in many methods Ax = λx = T x = θx In the transformed problem Eigenvalues are remapped, eigenvectors are not altered Convergence is usually improved (better separation) Shift-and-invert T SI = (A σi) 1 Cayley T C = (A σi) 1 (A + τi) Can be used to compute interior eigenvalues T not computed explicitly, linear solves instead Cheaper alternative: preconditioned eigensolvers (J-D)

19 What Users Need Abstraction of mathematical objects: vectors and matrices Efficient linear solvers and preconditioners Easy programming interface Run-time flexibility, full control over the solution process Parallel computing, mostly transparent to the user State-of-the-art eigensolvers Spectral transformations

20 What Users Need Provided by PETSc Abstraction of mathematical objects: vectors and matrices Efficient linear solvers and preconditioners Easy programming interface Run-time flexibility, full control over the solution process Parallel computing, mostly transparent to the user Provided by SLEPc State-of-the-art eigensolvers Spectral transformations

22 Large Scientific Codes: Common Programming Practice Algorithmic Implementations Application Data Layout Control I/O Tuned and machine dependent modules

23 Large Scientific Codes: Common Programming Practice Changes in algorithms sometimes lead to several years advancement in computations. Needs flexibility! Algorithmic Implementations Application Data Layout Control I/O Tuned and machine dependent modules

24 Large Scientific Codes: Common Programming Practice Changes in algorithms sometimes lead to several years advancement in computations. Needs flexibility! Its performance is influenced by system parameters and in steps in the algorithm. Critical points: portability and scalability. Algorithmic Implementations Application Data Layout Control Tuned and machine dependent modules I/O

25 Large Scientific Codes: Common Programming Practice Changes in algorithms sometimes lead to several years advancement in computations. Needs flexibility! Its performance is influenced by system parameters and in steps in the algorithm. Critical points: portability and scalability. Algorithmic Implementations Application Data Layout Control Tuned and machine dependent modules I/O New architecture requires extensive tuning, may even require new programming paradigms. This is difficult to maintain and not very portable.

26 Large Scientific Codes: An Alternative Approach User s Application Code (Main Control) Application Data Layout Algorithmic Implementations I/O Tuned and machine dependent modules

27 Large Scientific Codes: An Alternative Approach User s Application Code (Main Control) Compilers + Expert Drivers + Support AVAILABLE Application Data Layout LIBRARIES AVAILABLE Algorithmic Implementations LIBRARIES AVAILABLE I/O LIBRARIES Tuned and machine dependent modules

28 Large Scientific Codes: An Alternative Approach User s Application Code (Main Control) Compilers + Expert Drivers + Support AVAILABLE Application Data Layout LIBRARIES AVAILABLE Algorithmic Implementations LIBRARIES AVAILABLE I/O LIBRARIES Tuned and machine dependent modules Goal: minimize the time to solution (prototype and production); maximize software life and resource utilization

29 HPC Software Stack APPLICATIONS GENERAL PURPOSE TOOLS PLATFORM SUPPORT TOOLS & UTILITIES HARDWARE

30 HPC Software Stack APPLICATIONS GENERAL PURPOSE TOOLS PETSc, SLEPc, MUMPS, SuperLU, FFTW, SUNDIALS, Trilinos, Hypre, ScaLAPACK,... PLATFORM SUPPORT TOOLS & UTILITIES HARDWARE

31 Requirements of Sustainable Software 1. Robust Maintained across platforms, compiler independent, precision independent, error handling, checkpointing 2. Scalable (accross large petascale systems) 3. Extensible (new algorithms, new techniques) 4. Interoperable (frameworks, PSE s, tool-to-tool, components) 5. User friendly interfaces, well documented 6. Periodic tests and evaluations (versions of tools, OS, compiler) 7. Portability and fast adaptability 8. Long-term support 9. Training (hands-on examples) 10. Community support (developers, user groups, mailing lists)

32 PETSc: Portable, Extensible Toolkit for Scientific Computation Software for the scalable (parallel) solution of algebraic systems arising from partial differential equation (PDE) simulations Developed at Argonne National Lab since 1991 Usable from C, C++, Fortran77/90 Focus on abstraction, portability, interoperability Extensive documentation and examples Freely available and supported through Current version: 3.1 (released March 2010)

33 SLEPc: Scalable Library for Eigenvalue Problem Computations A general library for solving large-scale sparse eigenproblems on parallel computers For standard and generalized eigenproblems For real and complex arithmetic For Hermitian or non-hermitian problems Also support for SVD and QEP Ax = λx Ax = λbx Av i = σ i u i (λ 2 M+λC+K)x = 0 Current version: 3.1 (released Aug 2010)

34 The SLEPc Project Project started around releases since 2002 More than 5,300 downloads (average 100+ downloads per month) SLEPc downloads in the last 22 months downloads average Currently funded by MICINN

35 PETSc/SLEPc Numerical Components Nonlinear Systems Line Search Trust Region Other PETSc Euler Time Steppers Backward Euler Pseudo Time Step Other Krylov Subspace Methods GMRES CG CGS Bi-CGStab TFQMR Richardson Chebychev Other Additive Schwarz Block Jacobi Preconditioners Jacobi ILU ICC LU Other Matrices Compressed Sparse Row Block Compressed Sparse Row Block Diagonal Dense Other Vectors Index Sets Indices Block Indices Stride Other

36 PETSc/SLEPc Numerical Components PETSc SLEPc Nonlinear Systems Line Search Trust Region Other Euler Time Steppers Backward Euler Pseudo Time Step Other SVD Solvers Cross Cyclic Thick R. Lanczos Product Matrix Lanczos Quadratic Linearization Q-Arnoldi Krylov Subspace Methods GMRES CG CGS Bi-CGStab TFQMR Richardson Chebychev Other Eigensolvers Krylov-Schur Arnoldi Lanczos GD JD Other Preconditioners Spectral Transformation Additive Schwarz Block Jacobi Jacobi ILU ICC LU Other Shift Shift-and-invert Cayley Fold Preconditioner Matrices Compressed Sparse Row Block Compressed Sparse Row Block Diagonal Dense Other Vectors Index Sets Indices Block Indices Stride Other

37 EPS: Simple Example Program EPS eps; /* eigensolver context */ Mat A, B; /* matrices of Ax=kBx */ Vec xr, xi; /* eigenvector, x */ PetscScalar kr, ki; /* eigenvalue, k */ EPSCreate(PETSC_COMM_WORLD, &eps); EPSSetOperators(eps, A, B); EPSSetProblemType(eps, EPS_GNHEP); EPSSetFromOptions(eps); EPSSolve(eps); EPSGetConverged(eps, &nconv); for (i=0; i<nconv; i++) { EPSGetEigenpair(eps, i, &kr, &ki, xr, xi); } EPSDestroy(eps);

38 EPS: Run-Time Examples $ program -eps_type krylovschur -eps_nev 6 -eps_ncv 24 $ program -eps_type gd -eps_smallest_real -eps_max_it 2000 $ program -eps_type jd -eps_jd_blocksize 2 -eps_tol 1e-8 $ program -eps_target 2.1 -eps_harmonic $ program -eps_target 2.1 -st_type sinvert -st_ksp_type gmres -st_ksp_rtol 1e-8 -st_pc_type sor -st_pc_sor_omega 1.3 $ program -eps_view -eps_monitor -log_summary

Computers from RES IBM BladeCenter clusters with Myrinet interconnect 256 JS20 nodes Two 64-bit PowerPC 970FX @

39 Computers from RES IBM BladeCenter clusters with Myrinet interconnect 256 JS20 nodes Two 64-bit PowerPC 2.2 GHz processors We have used three nodes: MareNostrum (Barcelona) CaesarAugusta (Zaragoza) Tirant (Valencia)

40 Parallel Performance in MareNostrum Speed-up Matrix PRE2 S p = T s T p University of Florida Dimension 659,033 5,834,044 nonzeros 10 eigenvalues with 30 basis vectors 1 processor per node Speed-up Ideal ARPACK SLEPc Nodes

42 Molecular Clusters Goal: Analysis of high-nuclearity spin clusters Collaboration with a group in ICMOL-U. Valencia Parallel version of MAGPACK using SLEPc

43 Molecular Clusters Goal: Analysis of high-nuclearity spin clusters Collaboration with a group in ICMOL-U. Valencia Parallel version of MAGPACK using SLEPc Features: Irreducible Tensor Operators (ITO) technique All kind of anisotropic and isotropic spin interactions Bulk magnetic properties (magnetic susceptibility and magnetization) Spectroscopic properties (inelastic neutron scattering spectra) Validation [Ramos, R., Cardona-Serra, Clemente-Juan, 2010]

44 Molecular Clusters - Isotropic Analysis Study: 8 manganese ring with antiferromagnetic coupling (n=135,954) Isotropic analysis: no mixing between different spins The matrix has a block diagonal structure (one block per spin) Can diagonalize each spin matrix separately Computing 1000 eigenpairs takes 12 hours on 1 processor

45 Molecular Clusters - Anisotropic Analysis Chain of Ni atoms with antiferromagnetic interaction

46 Molecular Clusters - Anisotropic Analysis Chain of Ni atoms with antiferromagnetic interaction We use closed rings to simulate infinite chains Study: magnetic susceptibility in a ring of 10 centers of Nickel(II) ions (n=59,049)

47 Molecular Clusters - Anisotropic Analysis (Results) Performance analysis: solve the Nickel ring with 600 eigenvalues Breakdown of execution time (in seconds) Step Time Percentage Generation of energy matrix % First diagonalization % Generation of Zeeman matrices % Thermodynamic properties % (6 diagonalizations and 4 matrix AXPYs) TOTAL % *Results in CaesarAugusta with 8 nodes (16 procs) Matrix-vector multiplication accounts for 67 % of diagonalization Different matrix storage formats: symmetric, implicit A + γb

48 Molecular Clusters - Anisotropic Analysis (Results) 64 Ideal aijfull aijhalf sbaij 64 Ideal aijhalf sbaij sbaijshell sbaijreuse Speedup 32 Speedup Nodes Nodes

49 Molecular Clusters - Real Use Case: Fe8 Study: octanuclear iron, Fe8 (n=1,679,616) Classical example of single molecule magnet, with interesting properties Magnetic moment relaxation at low temperatures Strong axial anisotropy No one has been able to simulate this molecule

50 Molecular Clusters - Real Use Case: Fe8 Computing 200 lowest energy levels took 7 days on 64 processors (Tirant) Results match experimental observation Fundamental state of S=10 Two energy wells At temperatures below 0.36K only states S=10 and S=9 are populated

parallel code, scalable to 1000 s of processors Two main goals: a deeper

52 Plasma Physics Plasma turbulence in a tokamak determines its energy confinement Simulation based on the nonlinear gyrokinetic equations GENE parallel code, scalable to 1000 s of processors Two main goals: a deeper understanding of fundamental physics issues and direct comparisons with experiments

53 Turbulent Fluctuations Behaves more like 2D fluid turbulence due to strong background magnetic field

54 Turbulent Fluctuations (Cross Section) Inboard side streamer formation Outboard side streamer formation gradients fluctuations transport

55 The Plasma Turbulence Problem is Really 5D Dilute and/or hot plasmas are almost collisionless (3D) fluid theory not applicable, must use a (reduced) kinetic description! Vlasov-Maxwell Equations [ t + v x + q m Removing the fast gyromotion (E + v c B ) [Frieman, Chen, Lee, Hahm et al., 1980s] Charged rings as quasiparticles; gyrocenter coordinates ] f(x, v, t) = 0 v Nonlinear integro-differential equations in 5 dimensions...

56 The Nonlinear Gyrokinetic Equations f = f(x, v, µ; t) Advection/Conservation Equation f t + Ẋ f X + v f = 0 v X = gyrocenter position v = parallel velocity µ = magnetic moment Ẋ = v b + B B ( v B B 1 + v ) v c B 2 Ē1 B + µ mω b (B + B 1 ) + v2 Ω ( b) v = Ẋ mv (eē1 µ (B + B 1 ) ) and appropriate field equations

57 GENE: Numerical Solution Initial value solver Method of lines: space/velocity discretized first 6-step, 4th-order explicit Runge-Kutta for time integration An explicit time scheme is not globally stable the timestep must be kept below an upper limit To maximize efficiency, the largest possible timestep should be used The optimal timestep can be estimated from the eigenspectrum of the linearized equation This estimate should be very accurate

58 Plasma Physics - Use of SLEPc Knowledge of the spectrum of the linearized equation Ax = λx Complex, non-hermitian eigenproblem The matrix is not built explicitly Sizes ranging from a few millions to a billion Uses: 1. Largest magnitude eigenvalue to estimate optimal timestep 2. Track sub-dominant instabilities (rightmost eigenvalues) [R., Kammerer, Merz, Jenko, 2010]! "

59 Sub-dominant Modes Usually, turbulence codes address the initial value problem linear calculations only allow to determine the fastest growing eigenmode GENE can also be run as a linear eigenvalue solver, accessing also sub-dominant instabilities which can affect the turbulence properties ω γ R/L n = 4.4 R/L n = R/L T,e R/L T,e Challenging from numerical point of view solved with SLEPc

60 Plasma Physics - Performance Results Test case 1: tokamak model with a realistic species configuration (deuterium, tritium, helium and electrons)

61 Plasma Physics - Performance Results Test case 1: tokamak model with a realistic species configuration (deuterium, tritium, helium and electrons) Test case 2: stellarator device with realistic geometry (W7-X)

62 Plasma Physics - Strong Scaling Speedup on Tirant (matrix size is 442,368 in both cases) KS JD ideal Test case KS JD ideal Test case speedup 150 speedup processors processors KS=Krylov-Schur, JD=Jacobi-Davidson

63 Plasma Physics - Weak Scaling On 1 processor JD is about 4 times faster than KS for this problem Both methods scale well The plot shows weak scaling results, with increasing problem size (largest size is 884,736) time KS JD processors

64 Wrap Up SLEPc highlights: Growing list of eigensolvers (also QEP and SVD solvers) Seamlessly integrated spectral transformation Easy programming with PETSc s object-oriented style Data-structure neutral implementation Run-time flexibility Portability to a wide range of parallel platforms Usable from code written in C, C++, Fortran, Python Extensive documentation

More Information Homepage: http://www.grycap.upv.

65 More Information Homepage: Contact

SLEPc: overview, applications and some implementation details

SLEPc: overview, applications and some implementation details Jose E. Roman Universidad Politécnica de Valencia, Spain ETH Zürich - May, 2010 Arnoldi Method Computes a basis V m of K m (A, v 1 ) and H