Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Size: px
Start display at page:

Download "Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem"

Transcription

1 Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners:

2 The Problem The Bethe-Salpeter Eigenvalue Problem Find eigenvalues, right and left eigenvectors for H BS x = λx with [ ] [ ] A B A B H BS = = B Ā B H A T, A = A H, B = B T C n n are dense. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

3 The Problem The Bethe-Salpeter Eigenvalue Problem Find eigenvalues, right and left eigenvectors for H BS x = λx with [ ] [ ] A B A B H BS = = B Ā B H A T, A = A H, B = B T C n n are dense. Comes up in quantum chemical simulations. n is proportional to n 2 e where n e is the number of electrons in the system. n can become very large (> 50, 000) penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

4 The Problem The Bethe-Salpeter Eigenvalue Problem Find eigenvalues, right and left eigenvectors for H BS x = λx with [ ] [ ] A B A B H BS = = B Ā B H A T, A = A H, B = B T C n n are dense. Comes up in quantum chemical simulations. n is proportional to n 2 e where n e is the number of electrons in the system. n can become very large (> 50, 000) Parallel and scalable algorithms running on supercomputers necessary. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 2/25

5 Applications in Quantum Chemistry Eigenvalues λ i, right eigenvectors x i and left eigenvectors y i are used to compute Spectral Density φ(ω) = 1 2n Optical Absorption Spectrum 2n j=1 δ(ω λ j ), ɛ + (ω) = n j=1 (d H r x j )(y H j d l ) y H j x j δ(ω λ j ). penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 3/25

6 Applications in Quantum Chemistry Eigenvalues λ i, right eigenvectors x i and left eigenvectors y i are used to compute Spectral Density φ(ω) = 1 2n Optical Absorption Spectrum 2n j=1 δ(ω λ j ), ɛ + (ω) = n j=1 (d H r x j )(y H j d l ) y H j x j δ(ω λ j ). We are interested in all (or most) eigenpairs. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 3/25

7 Properties of (definite) Bethe-Salpeter EVP Due to the special structure eigenvalues appear in quadruples (λ, λ, λ, λ). [Benner, Faßbender, Yang 14/ 18] ir R penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 4/25

8 Properties of (definite) Bethe-Salpeter EVP Due to the special structure eigenvalues appear in quadruples (λ, λ, λ, λ). [Benner, Faßbender, Yang 14/ 18] [ ] [ ] In 0 A B H BS is called definite if H 0 I BS = > 0, holds in many n B Ā physical settings. Here eigenvalues come in real pairs and it holds Theorem [Shao, da Jornada, Yang, Deslippe, Louie 15] There exist X 1, X 2 C n n, and λ 1,..., λ n R +, Λ + = diag{λ 1,..., λ n}, s.t. [ [ Λ+ H BS X = X, Y Λ+] H Λ+ H BS = Y Λ+] H, [ ] [ ] X1 X where X = 2 X1 X, Y = 2. X 2 X 1 X 2 X 1 Y H X = I 2n penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 4/25

9 A Structure-Preserving Method [Shao, da Jornada, Yang, Deslippe, Louie 15] A direct method is implemented in BSEPACK 1. The structure-preserving acquisition of all eigenpairs in high precision relies on a connection to Hamiltonian matrices. Theorem Let Q = 1 [ In 2 I n [ Q H A B ] ii n, then ii n ] B Q = i Ā [ ] Im(A + B) Re(A B) =: ih, Re(A + B) Im(A B) where H is real Hamiltonian, i.e. JH = (JH) T with J = [ ] 0 I. I penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 5/25

10 The resulting algorithm Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem Require: A = A H, B = B T C n n, s.t. [ ] In 0 H 0 I BS = n Ensure: X 1, X 2 C n n and Λ + = diag{λ 1,..., λ n} satisfying H [ ] Re(A + B) Im(A B) 1: Construct M = Im(A + B) Re(A B) 2: Compute the Cholesky factorization ] M = LL T L [ ] A B > 0 B Ā [ ] X1 = X 2 [ ] X1 Λ X +. 2 [ 0 3: Construct W = L T In I n 0 4: Compute the spectral decomposition iw = [ ] Z + Z diag{λ+, Λ [ ] H. +} Z + Z 5: Set [ ] X1 = X 2 [ ] In 0 QLZ 0 I +Λ 1/2 + n penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

11 The resulting algorithm Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem Require: A = A H, B = B T C n n, s.t. [ ] In 0 H 0 I BS = n Ensure: X 1, X 2 C n n and Λ + = diag{λ 1,..., λ n} satisfying H [ ] Re(A + B) Im(A B) 1: Construct M = Im(A + B) Re(A B) 2: Compute the Cholesky factorization ] M = LL T L [ ] A B > 0 B Ā [ ] X1 = X 2 [ ] X1 Λ X +. 2 [ 0 3: Construct W = L T In I n 0 4: Compute the spectral decomposition iw = [ ] Z + Z diag{λ+, Λ [ ] H. +} Z + Z 5: Set [ ] X1 = X 2 [ ] In 0 QLZ 0 I +Λ 1/2 + n Main workload: Solve a strictly imaginary Hermitian eigenvalue problem. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

12 The resulting algorithm Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem Require: A = A H, B = B T C n n, s.t. [ ] In 0 H 0 I BS = n Ensure: X 1, X 2 C n n and Λ + = diag{λ 1,..., λ n} satisfying H [ ] Re(A + B) Im(A B) 1: Construct M = Im(A + B) Re(A B) 2: Compute the Cholesky factorization ] M = LL T L [ ] A B > 0 B Ā [ ] X1 = X 2 [ ] X1 Λ X +. 2 [ 0 3: Construct W = L T In I n 0 4: Compute the spectral decomposition iw = [ ] Z + Z diag{λ+, Λ [ ] H. +} Z + Z 5: Set [ ] X1 = X 2 [ ] In 0 QLZ 0 I +Λ 1/2 + n Main workload: Solve a strictly imaginary Hermitian eigenvalue problem. Imanginary part is skew-symmetric. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 6/25

13 Implementation W is skew-symmetric can be reduced to tridiagonal form (e.g via Householder transformations): 0 α 1. UWU T α = T = α2n 2 α 2n 2 0 penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 7/25

14 Implementation W is skew-symmetric can be reduced to tridiagonal form (e.g via Householder transformations): 0 α 1. UWU T α = T = α2n 2 α 2n 2 0 Can be transformed to symmetric form using D = diag{1, i, i 2,..., i 2n } 0 α 1 0 α 1. id H α α D = α2n α2n 2 α 2n 2 0 α 2n 2 0 penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 7/25

15 Implementation Reduction to tridiagonal form by editing ScaLAPACK s routine for tridiagonal reduction of symmetric matrices: PDSYTRD PDSSTRD penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

16 Implementation Reduction to tridiagonal form by editing ScaLAPACK s routine for tridiagonal reduction of symmetric matrices: PDSYTRD PDSSTRD Solve symmetric tridiagonal EVP via 1. Bisection method (PDSTEBZ) 2. Inverse Iteration (PDSTEIN). penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

17 Implementation Reduction to tridiagonal form by editing ScaLAPACK s routine for tridiagonal reduction of symmetric matrices: PDSYTRD PDSSTRD Solve symmetric tridiagonal EVP via 1. Bisection method (PDSTEBZ) 2. Inverse Iteration (PDSTEIN). Back transformation of eigenvectors: [ ] [ ] X1 In 0 = QLUDV X 2 0 I + Λ 1/2 +, n where all matrices but D = diag{1, i, i 2..., i 2n } are real. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 8/25

18 The ELPA Library BSEPACK is just proof-of-concept, not performance optimized. Room for improvement by using better libraries! The ELPA Project 2 Eigenvalue SoLvers for Petaflop-Applications The publicly available ELPA library provides highly efficient and highly scalable direct eigensolvers for symmetric matrices. Though especially designed for use for PetaFlop/s applications solving large problem sizes on massively parallel supercomputers, ELPA eigensolvers have proven to be also very efficient for smaller matrices. Mainly developed at Max Planck Computing and Data Facility (MPCDF) in Garching. Original application area: electronic structure calculations. 2 penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 9/25

19 Tridiagonalisation in ELPA Figure: ELPA employs a two-step tridiagonalisation for symmetric or Hermitian matrices. [Marek et al. 14] penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 10/25

20 ELPA vs. ScaLAPACK BLAS lvl. 3 routines for redution to banded form. More data locality, less communication, highly efficient GEMM routines! Carefully crafted communication patterns in reduction to tridiagonal form and eigenvector back transformation. Solution of Tridiagonal Systems: Divide-and-Conquer instead of Bisection Method and Inverse Iteration. OpenMP and GPU support. Better Performance and Scalability! penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 11/25

21 Using ELPA to enhance BSEPACK Two promising points of attack for ELPA: 1. Diagonalize complex Hermitian matrix iw = ZΛZ H using ELPA + Main portion of the workload could benefit from performance and scalability of ELPA. Uses complex arithmetic, while BSEPACK mainly work on real data. 2. Extend the ELPA algorithm for skew-symmetric matrices, just as ScaLAPACK was extended in BSEPACK. + Easy from mathematical point of view Major revision of ELPA software stack necessary. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 12/25

22 Using ELPA to enhance BSEPACK Two promising points of attack for ELPA: 1. Diagonalize complex Hermitian matrix iw = ZΛZ H using ELPA + Main portion of the workload could benefit from performance and scalability of ELPA. Uses complex arithmetic, while BSEPACK mainly work on real data. 2. Extend the ELPA algorithm for skew-symmetric matrices, just as ScaLAPACK was extended in BSEPACK. + Easy from mathematical point of view Major revision of ELPA software stack necessary. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 12/25

23 Numerical Test Setup System DRACO Located at Max Planck Computing and Data Facility HPC Extension to HPC System HYDRA 880 nodes, Intel Haswell Xeon E5-2698, GHz 128 GB main memory per node Interconnect: fast InfiniBand FDR14 network penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 13/25

24 A C n n and B C n n where initialized with random complex values. [ ] A B Diagonal of A positive and scaled up, s.t. is positive definite B Ā (Gershgorin Circle Theorem). To work on a matrix on a distributed memory machine, it is divided into many subblocks of a certain block sizes n b n b. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 14/25

25 A C n n and B C n n where initialized with random complex values. [ ] A B Diagonal of A positive and scaled up, s.t. is positive definite B Ā (Gershgorin Circle Theorem). To work on a matrix on a distributed memory machine, it is divided into many subblocks of a certain block sizes n b n b. Numerical Tests: 1. Test impact of block size on performance for n = 25, Test strong scalability for medium sized matrices n = 25, Compare runtimes for larger matrices up to n = 75, 000. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 14/25

26 Block Size in BSEPACK and BSEPACK+ELPA 5,000 4,000 BSEPACK BSEPACK+ELPA Runtimes in s 3,000 2,000 1, Block Size n b Figure: Runtimes for different block sizes at n = 25, 000. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 15/25

27 Scalability of BSEPACK and BSEPACK+ELPA 6,000 BSEPACK, n b = 64 BSEPACK n b = 256 BSEPACK + ELPA, n b = 64 Runtime in s 4,000 2, Number of Nodes Figure: Strong Scalability for medium sized matrix (n = 25, 000) in BSEPACK and BSEPACK enhanced with ELPA. 32 threads were used per node. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 16/25

28 10 4 Scalability of BSEPACK and BSEPACK+ELPA Runtime in s BSEPACK Tridiagonalization (ScaLAPACK) BSEPACK + ELPA Hermitian Solve (ELPA) Perfect Scaling Number of Nodes Figure: Strong Scalability for medium sized matrix (n = 25, 000) in BSEPACK and BSEPACK enhanced with ELPA. 32 threads were used per node. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 17/25

29 Runtimes for large matrices Runtime in s 6,000 4,000 2,000 BSEPACK, n b = 64 BSEPACK, n b = 256 BSEPACK + ELPA, n b = Size of submatrices A, B C n n 10 4 Figure: Runtimes of BSEPACK and BSEPACK+ELPA for large matrices of size 2n 2n. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 18/25

30 Runtimes for large matrices 10 4 Runtime in s BSEPACK n b = 64 BSEPACK n b = 256 BSEPACK + ELPA Exponent 2.79 Exponent 2.40 Exponent ,000 15,000 25,000 40,000 65,000 Size of submatrices A, B C n n Figure: Runtimes of BSEPACK and BSEPACK+ELPA for large matrices of size 2n 2n. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 19/25

31 Using ELPA to enhance BSEPACK Two promising points of attack for ELPA: 1. Diagonalize complex Hermitian matrix iw = ZΛZ H using ELPA + Main portion of the workload could benefit from performance and scalability of ELPA. Uses complex arithmetic, while BSEPACK mainly work on real data. 2. Extend the ELPA algorithm for skew-symmetric matrices, just as ScaLAPACK was extended in BSEPACK. + Easy from mathematical point of view Major revision of ELPA software stack necessary. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 20/25

32 ELPA for Skew-Symmetric Matrices Stays the same: Computation of Householder vectors. Application to off-diagonal blocks. Tridiagonal solve. Most of the back transformation of eigenvectors. All communication and synchronizations. C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 21/25

33 ELPA for Skew-Symmetric Matrices Stays the same: Computation of Householder vectors. Application to off-diagonal blocks. Tridiagonal solve. Most of the back transformation of eigenvectors. All communication and synchronizations. Change wherever symmetry is implicitly assumed: Updates on blocks including the diagonal: A (I τvv T )A(I τvv T ) = A v (τv T A 0.5τ 2 v T Avv T ) }{{} u T 1 (Aτv 0.5τ 2 vv T Av) }{{} u 2 A = A T u 2 = u 1, A = A T u 2 = u 1 v T penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 21/25

34 ELPA for Skew-Symmetric Matrices Use skew-symmetric BLAS routines provided by BSEPACK. DSYR2 DSSR2 DSYMV DSSMV Multiplication with D = diag{1, i, i 2,..., i 2n } in back transformation. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 22/25

35 ELPA for Skew-Symmetric Matrices Use skew-symmetric BLAS routines provided by BSEPACK. DSYR2 DSSR2 DSYMV DSSMV Multiplication with D = diag{1, i, i 2,..., i 2n } in back transformation. Preliminary results from a smaller compute server. Bruno 2 Intel Haswell Xeon E5-2640v3, 8 cores 32 GB main memory per CPU penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 22/25

36 Runtimes for large matrices Runtime in s 2,500 2,000 1,500 1, BSEPACK, n b = 64 BSEPACK + complex ELPA, n b = 64 BSEPACK + skew-symmetric ELPA, n b = Size of submatrices A, B C n n 10 4 Figure: Preliminary results matrices of size 2n 2n running on 16 cores. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 23/25

37 Easy further improvements: Rank-2 Update becomes easier for skew-symmetric matrices: A (I τvv T )A(I τvv T ) = A v(τv T A 0.5τ 2 v} T {{ Av} v T ) =0 (Aτv 0.5τ 2 v v} T {{ Av} )v T =0 Eigenvector resulting from tridiagonal solve have to be multiplied by complex diagonal D before back transformation. Instead of applying Householder transformation directly, we apply them to identity matrix and then multiply. Easier to implement but more FLOPs. penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 24/25

38 Future Work ELPA for skew-symmetric matrices on production level (including OpenMP and GPU support) HPC implementations of algorithms for Hamiltonian matrices. Structure preserving Divide-and-Conquer scheme based on the matrix disk function / doubling algorithm [Bai, Demmel, Gu 97] and permuted Lagrangian Graph Bases [Mehrmann, Poloni 12]. Thank you for your attention! penke@mpi-magdeburg.mpg.de C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25

39 The ELPA library can be found at C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25

40 M. Shao, F. H. da Jornada, C. Yang, J. Deslippe, and S. G. Louie. Structure preserving parallel algorithms for solving the Bethe-Salpeter eigenvalue problem. Linear Algebra and its Applications, 488(Supplement C): , P. Benner, H. Faßbender, and C. Yang. Some remarks on the complex J-symmetric eigenproblem. Preprint MPIMD/15-12, Max Planck Institute Magdeburg, July Available from T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni, L. Krmer, B. Lang, H. Lederer, and P.R. Willems. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Computing, 37(12): , th International Workshop on Parallel Matrix Algorithms and Applications (PMAA 10). Z. Bai, J. Demmel, and M. Gu. An Inverse Free Parallel Spectral Divide and Conquer Algorithm for Nonsymmetric Eigenproblems. 76(3): , Peter Benner, Heike Fabender, and Chao Yang. Some remarks on the complex J-symmetric eigenproblem. Linear Algebra and its Applications, 544: , V. Mehrmann and F. Poloni. Doubling algorithms with permuted Lagrangian graph bases. SIAM J. Matrix Anal. Appl., 33(3): , C. Penke ELPA for the Bethe-Salpeter Eigenvalue Problem 25/25

A hybrid Hermitian general eigenvalue solver

A hybrid Hermitian general eigenvalue solver Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,

More information

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen Verbundprojekt ELPA-AEO http://elpa-aeo.mpcdf.mpg.de Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen BMBF Projekt 01IH15001 Feb 2016 - Jan 2019 7. HPC-Statustagung,

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science

The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science TOPICAL REVIEW The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science Andreas Marek 1, Volker Blum 2,3, Rainer Johanni 1,2 ( ), Ville Havu 4,

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and Home Search Collections Journals About Contact us My IOPscience The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science This content has been

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION

A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION SIAM J MATRIX ANAL APPL Vol 0, No 0, pp 000 000 c XXXX Society for Industrial and Applied Mathematics A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION WEI XU AND SANZHENG QIAO Abstract This paper

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Structured Krylov Subspace Methods for Eigenproblems with Spectral Symmetries

Structured Krylov Subspace Methods for Eigenproblems with Spectral Symmetries Structured Krylov Subspace Methods for Eigenproblems with Spectral Symmetries Fakultät für Mathematik TU Chemnitz, Germany Peter Benner benner@mathematik.tu-chemnitz.de joint work with Heike Faßbender

More information

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

On aggressive early deflation in parallel variants of the QR algorithm

On aggressive early deflation in parallel variants of the QR algorithm On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Exponentials of Symmetric Matrices through Tridiagonal Reductions

Exponentials of Symmetric Matrices through Tridiagonal Reductions Exponentials of Symmetric Matrices through Tridiagonal Reductions Ya Yan Lu Department of Mathematics City University of Hong Kong Kowloon, Hong Kong Abstract A simple and efficient numerical algorithm

More information

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures. F Tisseur and J Dongarra

A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures. F Tisseur and J Dongarra A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures F Tisseur and J Dongarra 999 MIMS EPrint: 2007.225 Manchester Institute for Mathematical

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Parallel Eigensolver Performance on the HPCx System

Parallel Eigensolver Performance on the HPCx System Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information

More information

ESLW_Drivers July 2017

ESLW_Drivers July 2017 ESLW_Drivers 10-21 July 2017 Volker Blum - ELSI Viktor Yu - ELSI William Huhn - ELSI David Lopez - Siesta Yann Pouillon - Abinit Micael Oliveira Octopus & Abinit Fabiano Corsetti Siesta & Onetep Paolo

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June

More information

A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices

A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices Hiroyui Ishigami, Hidehio Hasegawa, Kinji Kimura, and Yoshimasa Naamura Abstract The tridiagonalization

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev,

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik 1, Grey Ballard 2, James Demmel 3, Torsten Hoefler 4 (1) Department of Computer Science, University of Illinois

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

GPU accelerated Arnoldi solver for small batched matrix

GPU accelerated Arnoldi solver for small batched matrix 15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2009 On the Modification of an Eigenvalue Problem that Preserves an Eigenspace Maxim Maumov

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Konrad Waldherr October 20, 2013 Joint work with Thomas Huckle QCCC 2013, Prien, October 20, 2013 1 Outline Numerical (Multi-) Linear

More information

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.

More information

Matrix Eigensystem Tutorial For Parallel Computation

Matrix Eigensystem Tutorial For Parallel Computation Matrix Eigensystem Tutorial For Parallel Computation High Performance Computing Center (HPC) http://www.hpc.unm.edu 5/21/2003 1 Topic Outline Slide Main purpose of this tutorial 5 The assumptions made

More information

FEAST eigenvalue algorithm and solver: review and perspectives

FEAST eigenvalue algorithm and solver: review and perspectives FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Max Planck Institute Magdeburg Preprints

Max Planck Institute Magdeburg Preprints Thomas Mach Computing Inner Eigenvalues of Matrices in Tensor Train Matrix Format MAX PLANCK INSTITUT FÜR DYNAMIK KOMPLEXER TECHNISCHER SYSTEME MAGDEBURG Max Planck Institute Magdeburg Preprints MPIMD/11-09

More information

Algorithms for Solving the Polynomial Eigenvalue Problem

Algorithms for Solving the Polynomial Eigenvalue Problem Algorithms for Solving the Polynomial Eigenvalue Problem Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with D. Steven Mackey

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Symmetric rank-2k update on GPUs and/or multi-cores

Symmetric rank-2k update on GPUs and/or multi-cores Symmetric rank-2k update on GPUs and/or multi-cores Assignment 2, 5DV050, Spring 2012 Due on May 4 (soft) or May 11 (hard) at 16.00 Version 1.0 1 Background and motivation Quoting from Beresford N. Parlett's

More information

Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures

Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures Grégoire Pichon, Azzam Haidar, Mathieu Faverge, Jakub Kurzak To cite this version: Grégoire Pichon, Azzam Haidar, Mathieu

More information

Accurate eigenvalue decomposition of arrowhead matrices and applications

Accurate eigenvalue decomposition of arrowhead matrices and applications Accurate eigenvalue decomposition of arrowhead matrices and applications Nevena Jakovčević Stor FESB, University of Split joint work with Ivan Slapničar and Jesse Barlow IWASEP9 June 4th, 2012. 1/31 Introduction

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

A PARALLEL EIGENSOLVER FOR DENSE SYMMETRIC MATRICES BASED ON MULTIPLE RELATIVELY ROBUST REPRESENTATIONS

A PARALLEL EIGENSOLVER FOR DENSE SYMMETRIC MATRICES BASED ON MULTIPLE RELATIVELY ROBUST REPRESENTATIONS SIAM J. SCI. COMPUT. Vol. 27, No. 1, pp. 43 66 c 2005 Society for Industrial and Applied Mathematics A PARALLEL EIGENSOLVER FOR DENSE SYMMETRIC MATRICES BASED ON MULTIPLE RELATIVELY ROBUST REPRESENTATIONS

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2010-026 On Preconditioned MHSS Iteration Methods for Complex Symmetric Linear Systems by Zhong-Zhi Bai, Michele Benzi, Fang Chen Mathematics and Computer Science EMORY UNIVERSITY On

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Index. for generalized eigenvalue problem, butterfly form, 211

Index. for generalized eigenvalue problem, butterfly form, 211 Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,

More information

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Jaehyun Park June 1 2016 Abstract We consider the problem of writing an arbitrary symmetric matrix as

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Intel Math Kernel Library (Intel MKL) LAPACK

Intel Math Kernel Library (Intel MKL) LAPACK Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue

More information

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices. A.M. Matsekh E.P. Shurina 1 Introduction We present a hybrid scheme for computing singular vectors

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations

Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations T. Auckenthaler a, V. Blum b, H.-J. Bungartz a, T. Huckle a, R. Johanni c, L. Krämer d, B. Lang d,, H.

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

The geometric mean algorithm

The geometric mean algorithm The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm

More information

Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory

Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory elsi-interchange.org MolSSI Workshop and ELSI Conference 2018, August 15, 2018, Richmond, VA Recent Developments in the ELSI Infrastructure for Large-Scale Electronic Structure Theory Victor Yu 1, William

More information

Avoiding Communication in Distributed-Memory Tridiagonalization

Avoiding Communication in Distributed-Memory Tridiagonalization Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura

More information

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 17 1 / 26 Overview

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at

More information

Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures

Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures Khairul Kabir University of Tennessee kkabir@vols.utk.edu Azzam Haidar

More information

Iterative Algorithm for Computing the Eigenvalues

Iterative Algorithm for Computing the Eigenvalues Iterative Algorithm for Computing the Eigenvalues LILJANA FERBAR Faculty of Economics University of Ljubljana Kardeljeva pl. 17, 1000 Ljubljana SLOVENIA Abstract: - We consider the eigenvalue problem Hx

More information

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse

More information

Making electronic structure methods scale: Large systems and (massively) parallel computing

Making electronic structure methods scale: Large systems and (massively) parallel computing AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline

More information

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR QR-decomposition The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which In Matlab A = QR [Q,R]=qr(A); Note. The QR-decomposition is unique

More information

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Numerical Methods I: Eigenvalues and eigenvectors

Numerical Methods I: Eigenvalues and eigenvectors 1/25 Numerical Methods I: Eigenvalues and eigenvectors Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu November 2, 2017 Overview 2/25 Conditioning Eigenvalues and eigenvectors How hard are they

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

A note on eigenvalue computation for a tridiagonal matrix with real eigenvalues Akiko Fukuda

A note on eigenvalue computation for a tridiagonal matrix with real eigenvalues Akiko Fukuda Journal of Math-for-Industry Vol 3 (20A-4) pp 47 52 A note on eigenvalue computation for a tridiagonal matrix with real eigenvalues Aio Fuuda Received on October 6 200 / Revised on February 7 20 Abstract

More information

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,

More information

Solving large Hamiltonian eigenvalue problems

Solving large Hamiltonian eigenvalue problems Solving large Hamiltonian eigenvalue problems David S. Watkins watkins@math.wsu.edu Department of Mathematics Washington State University Adaptivity and Beyond, Vancouver, August 2005 p.1 Some Collaborators

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations

On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations Max Planck Institute Magdeburg Preprints Martin Köhler Jens Saak On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations MAX PLANCK INSTITUT FÜR DYNAMIK KOMPLEXER TECHNISCHER

More information

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations Peter Benner Max-Planck-Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information