Improving the performance of applied science numerical simulations: an application to Density Functional Theory

Size: px
Start display at page:

Download "Improving the performance of applied science numerical simulations: an application to Density Functional Theory"

Transcription

1 Improving the performance of applied science numerical simulations: an application to Density Functional Theory Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum Jülich GmbH Aachen Institute for Advanced Study in Computational Engineering Science RWTH Aachen University Columbia University, March 5th 2013 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

2 Motivation and Goals Two Facts Often algorithmic libraries are used as black boxes. Very little information coming from the physics of the specific application is exploited by scientific computing codes. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

3 Motivation and Goals Two Facts Often algorithmic libraries are used as black boxes. Very little information coming from the physics of the specific application is exploited by scientific computing codes. One Objective Exploiting physical information extracted from the simulations in order to: increase the performance of large legacy codes; improve the computational paradigm on which the codes are based. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

4 Computational science Traditional approach Science-driven workflow Mathematical Model = Algorithmic Structure = Simulations = Physics Math Algorithm Often a literal translation of the mathematical model into a combination of algorithmic tasks usually designed in abstraction from other considerations (kernel optimization, tasks harmonization, etc.); Algorithm Sim The simulation is an end product, the successful implementation of an algorithm into a series of machine accessible operations resulting in the computation of meaningful physical quantities; Math Sim The mathematical model and the simulation are considered as practically disjoint. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

5 Computational science Modern approach Science-driven + high-performance workflow Mathematical Model Algorithmic Structure Simulations = Physics Math Algorithm The mathematics model undergoes substantial reformulation in order to cast operations in terms of optimized algorithmic structures. At the same time concerns regarding the numerical stability and accuracy are carefully taken into consideration and included in the reformulation; Algorithm Sim Simulations can become a tool to uncover hidden aspects of physical phenomena missed by the mathematical model. Information extracted from a careful analysis of the computations can greatly influence the choice of the optimal algorithm and how it should be used; Math Sim Some information extracted from a careful analysis of the simulations can influence so much the algorithmic choice to suggest a complete revolution in the formulation of the mathematical model. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

6 Reverse Simulation Math, Algorithm Sim total energy band energy gap conductivity forces, etc. Simulations = = Mathematical model Algorithmic structure FLAPW Paradigm Mathematical Model Algorithmic Structure Feedback from analysis of correlated eigenproblems {Pi} SIM SIM. SIM SIM SIM SIM Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

7 Sequences of eigenproblems in Density Functional Theory A glimpse of the results Speedup Scalability 2.8 Speed up vs. Iteration index for Nev=256 and three distinct matrix sizes 8 BChFSI Strong scalability NaCl n=3893 NaCl n=6217 NaCl n= AuAg n=8970 NaCl n=6217 CaFeAs n=2612 Ideal speed up 2.2 Speed up Speed up Iteration index Number of threads Table: Efficiency Self-consistent cycle Percentage of peak performance % % % % % % Fast: a great improvement of the execution time Efficient: an extensive use of computing resources Scalable: ideal use of parallel computing architectures Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

8 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

9 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

10 The Foundations Investigative framework Quantum Mechanics and its ingredients n H = 2m h2 2 n Z α i i=1 i=1 α x i a α + 1 i<j x i x j Hamiltonian Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) Wavefunction ( ) n Φ : R 3 {± 1 2 } R high-dimensional anti-symmetric function describes the orbitals of atoms and molecules. In the Born-Oppenheimer approximation, it is the solution of the Electronic Schrödinger Equation H Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) = E Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

11 Density Functional Theory (DFT) Hohenberg-Kohn theorem (1964) one-to-one correspondence n(r) V 0 (r) = V 0 (r) = V 0 (r)[n]! a functional E[n] : E 0 = min n E[n] 1 Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) = Λ i,a φ a (x i ;s i ) 2 Charge density n(r) = a f a φ a (r) 2 3 In the Schrödinger equation the exact Coulomb interaction is substituted with an effective potential V 0 (r) = V I (r) + V H (r) + V xc (r) The high-dimensional Schrödinger equation translates into a set of coupled non-linear low-dimensional self-consistent Kohn-Sham (KS) equations ( ) a solve Ĥ KS φ a (r) = h2 2m 2 + V 0 (r) φ a (r) = ε a φ a (r) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

12 Kohn-Sham scheme and DFT Self-consistent cycle Typically this set of equations is solved using an iterative self-consistent cycle Initial guess n init (r) = Compute KS potential V 0 (r)[n] Solve KS equations Ĥ KS φ a (r) = ε a φ a (r) No OUTPUT Energy, forces,... Yes = Converged? n (l+1) n (l) < η Compute new density n(r) = a f a φ a (r) 2 In practice this iterative cycle is much more computationally challenging and requires some form of broadly defined discretization. A common way of discretizing the KS equations is to expand the wave functions φ a (r) on a basis set. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

13 Introduction to FLAPW Linearized augmented plane wave (LAPW) in a Muffin Tin (MT) geometry Lapw φ k,ν (r) = c G k,ν ψ G(k,r) G+k G max 1 e i(k+g)r Ω ψ G (k,r) = α,l,m k ν Bloch vector band index Interstitial (I) [ a α,g lm (k)uα l (r) + bα,g lm (k) uα l ]Y (r) lm (ˆr α ) Muffin Tin (MT) boundary conditions Continuity of wavefunction and its derivative at MT boundary a α,g lm (k) and bα,g lm (k) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

14 Introduction to FLAPW Linearized augmented plane wave (LAPW) in a Muffin Tin (MT) geometry... cont. The radial functions u α l (r) are the solutions of the atomic Schrödinger equations where the potential retains only the spherical part. [ Ĥsph α uα l (r) = h2 2 2m r 2 + h2 2m l(l + 1) r 2 + Vsph α (r) ] r u α l (r) = E lu α l (r) where the u can be determined from the energy derivative of the Schrödinger-like equation Ĥ α sph uα l = E l u α l + uα l E l is a parameter and it is predetermined by optimization like the plane wave case there is a cut-off G max (typically ) unlike the plane waves there is also a cut-off (for open-shell atoms) on l max (typically 8) unlike the plane waves this set of basis functions is overcomplete it is NOT an orthogonal set a new set of basis functions is computed at each self-consistent cycle Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

15 Introduction to FLAPW Generalized eigenvalue problems The expansion of is φ k,ν (r) in terms of LAPW basis set is then inserted in the KS equations ψ G (k,r) Ĥ KS c G k,ν ψ G (k,r) = λ kν ψ G (k,r) c G k,ν ψ G (k,r), G G and, by defining the matrix entries for the left and right hand side respectively as Hamiltonian A k and overlap matrices B k, [A k B k ] GG = dr ψ G (k,r)[ Ĥ KS ˆ1 ] ψ G (k,r) α one arrives at generalized eigenvalue equations parametrized by k P k : (A k ) GG c G kν = λ kν (B k ) GG c G kν G G or, in compact form, A k x i = λ i B k x i. with x i. = c G kν. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

16 Discretized Kohn-Sham scheme Self-consistent cycle Initial guess n start (r) = Compute KS potential V 0 (r)[n] Solve a set of eigenproblems P k1...p kn No OUTPUT Energy,... Yes = Converged? n (l+1) n (l) < η Compute new density n(r) = k,ν f k,ν φ k,ν (r) 2 Observations: 1 every P k : Ax = Bλx is a generalized eigenvalue problem; 2 A and B are DENSE and hermitian (B is also pos. def.); 3 P k s with different k index have different size and are independent from each other. 4 k= 1: ; i = 1:20-50 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

17 A (mostly) converging process The traditional approach Initialization: At every iteration of the cycle all numerical quantities are entirely re-computed A new set of basis functions is assembled at each at iteration cycle (Math Sim) The entries of the matrices A and B are re-initialized at each iteration cycle (Algorithm Sim) Eigenproblems: Each eigenproblem P (l) : A (l) x = λb (l) x at iteration l is solved in total independence from the eigenproblem P (l 1) of the previous iteration (Algorithm Sim) Convergence: Starting with a electron density close enough to the one minimizing the energy E 0 it is likely to reach convergence within few tens of iterations. Unfortunately there is no general theorem establishing the converging conditions # of steps is still uncertain. They depend on the material and the initial guess (area of convergence) (Math Algorithm) n (r) n(r) undergo relatively small decreasing oscillations (Math Algorithm & Sim) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

18 A (mostly) converging process The traditional approach Initialization: At every iteration of the cycle all numerical quantities are entirely re-computed A new set of basis functions is assembled at each at iteration cycle (Math Sim) The entries of the matrices A and B are re-initialized at each iteration cycle (Algorithm Sim) Eigenproblems: Each eigenproblem P (l) : A (l) x = λb (l) x at iteration l is solved in total independence from the eigenproblem P (l 1) of the previous iteration (Algorithm Sim) Convergence: Starting with a electron density close enough to the one minimizing the energy E 0 it is likely to reach convergence within few tens of iterations. Unfortunately there is no general theorem establishing the converging conditions # of steps is still uncertain. They depend on the material and the initial guess (area of convergence) (Math Algorithm) n (r) n(r) undergo relatively small decreasing oscillations (Math Algorithm & Sim) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

19 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

20 FLAPW self-consistent cycle Computational costs Spherical V 0(r) n 0(r) init. Pseudo-charge method Calculation of full-potential V (r)[n] 40% time usage Convergence check n(r) mixing n(r) for next cycle Determining the Lapw wavefunctions ψ G(k, r) 5 10% time usage Calculation of new charge density n(r) a G lm and bg lm coeff. 3-D stars, 2-D stars lattice harmonics Matrices generation A k =< ψ(k) H ψ (k) > B k =< ψ(k) S ψ (k) > <1% time usage Fermi energy calculation Eigenpairs selection Generalized eigenproblems solution A kx = λb kx Eigenproblems: distribution and setup Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

21 Sequences of eigs: an ALGORITHM SIM case Sequences of eigenproblems Consider the set of generalized eigenproblems P (1)...P (l) P (l+1)...p (N) not as a set of disjoint problems (P) N, but as a sequence; { Could this sequence P (l)} of eigenproblems evolve following a convergence pattern in line with the convergence of n(r)? REVERSE SIMULATION method numerical simulations analyzed employing a parameter-based inverse problem method; collected data on deviation angles b/w eigenvectors of adjacent eigenproblems; identified evolutions of eigenvectors along the sequence. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

22 Angle evolution fixed k Example: a metallic compound at fixed k of subspace angle for eigenvectors of k point 1 and lowest 75 eigs 0Evolution 10 AuAg Angle b/w eigenvectors of adjacent iterations Iterations (2 > 22) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

23 Correlation and its exploitation correlation between successive eigenvectors x (l 1) and x (l) Angles decrease monotonically with some oscillation Majority of angles are small after the first few iterations Note: Mathematical model Correlation. Correlation numerical analysis of the simulation. ALGORITHM SIM The stage is favorable to an iterative eigensolver where the eigenvectors of P (l 1) are fed to the solve P (l). Next stages of the investigation: 1 Development of a block iterative eigensolver that can exploit the correlation 2 Investigate if approximate eigenvectors can speed-up the iterative solver of choice 3 Understand if such an iterative method be competitive with direct methods for dense problems Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

24 Algorithmic digression Direct solvers. Iterative solvers. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

25 Algorithmic digression Direct solvers. Iterative solvers. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

26 Algorithmic digression Direct solvers. Iterative solvers. λ 1 > λ 2 > λ 3 >... Ax j = λ j x j v = j γ j x j [ ] Av = j λ j γ j x j A k v = j λ k j γ λ jx j = λ 1 x 1 + j j 2 x λ 1 j λ j Rate of convergence magnitude of λ 1 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

27 Algorithmic digression Direct solvers. Iterative solvers. Dense matrices. Sparse matrices. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

28 Algorithmic digression Direct solvers. Iterative solvers. Dense matrices. Sparse matrices. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

29 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

30 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

31 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: 1 The ability to receive as input a sizable set Z 0 of approximate eigenvectors; 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

32 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: 1 The ability to receive as input a sizable set Z 0 of approximate eigenvectors; 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. ChFSI constitutes the natural choice: it accepts the full set of multiple starting vectors; it avoids stalling when facing small clusters of eigenvalues; when augmented with polynomial accelerators it has a much faster convergence rate; converged eigenvectors can be easily locked. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

33 ChFSI pseudocode INPUT: Hamiltonian, approximate eigenpairs (Λ,Z 0 ), TOL, DEG. OUTPUT: Wanted eigenpairs W. 1 Lanczos step. Identify the bounds for the interval to be filtered out. REPEAT UNTIL CONVERGENCE: 2 Chebyshev filter. Filter a block of vectors W Z 0. 3 QR decomposition. Re-orthogonalize the vectors outputted by the filter; W = QR. 4 Compute the Rayleigh quotient G = Q HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors. END REPEAT Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

34 The core of the algorithm: Chebyshev filter Chebyshev polynomials The Chebyshev polynomial C m of the first kind of order m, is defined as { cos(marccos(x)), x [ 1,1]; C m (x) = cosh(marccosh(x)), x > 1. Three-terms recurrence relation C m+1 (x) = 2xC m (x) C m 1 (x); m N, C 0 (x) = 1, C 1 (x) = x Degree 5 6 Degree 10 x Degree 15 x Degree 20 x Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

35 The core of the algorithm: Chebyshev filter The basic principle Theorem Let γ > 1 and P m denote the set of polynomials of degree smaller or equal to m. Then the extremum min max p(t) p P m,p(γ)=1 t [ 1,1] is reached by p m (t). = C m(t) C m (γ). A generic vector v is very quickly aligned in the direction of the eigenvector corresponding to the extremal eigenvalue λ 1 v m n n = p m (A)v = s i p m (A)x i = s i p m (λ i )x i i=1 i=1 n C m ( = s 1 x 1 + λ i c e ) s i i=2 C m ( λ 1 c e ) x i s 1 x 1 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

36 The core of the algorithm: Chebyshev filter The pseudocode A simple linear transformation maps [ 1,1] [α,β] R c = β + α 2 center of the interval e = β α 2 width of the interval. INPUT: Hamiltonian H - approx. eigenvectors Z 0 lowest eigenv. λ 1 - parameters for interval c, e - DEG. OUTPUT: Filtered vectors W. 1 σ 1 e/(λ 1 c) 2 Z 1 σ 1 e (H ci n)z 0 FOR: i = 1 DEG σ i+1 (2/σ 1 σ i ) 4 Z i+1 2 σ i+1 e (H ci n)z i σ i+1 σ i Z i 1 END FOR. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

37 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

38 Experimental tests setup Solving with a ChFSI version implemented in C Approx. vs Random vectors fed to ChFSI against Iteration Index; Parallel ChFSI vs Direct methods against Iteration Index. Matrix sizes: 2,600 13,300. B is in general almost singular (ill-conditioned). Examples: size(a) = 50 κ(a) 10 4 ; size(a) = 500 κ(a) 10 7 We used the standard form for the problem Ax = λbx A y = λy with A = L 1 AL T and y = L T x Tests were performed on JUROPA using 1 node with 8 cores. 2 Intel Xeon 5570 (Nehalem-EP) quad-core processors, 2.93GHz; 24 GB/node; THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS; Minimum absolute tolerance of residuals r x = ; All numerical data are MEDIAN values over 12 distinct measurements. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

39 Speed-up for sequential ChFSI Na 15 Cl 14 Li with increasing G max = 3.0,3.5, Speed up vs. Iteration index for Nev=256 and three distinct matrix sizes NaCl n=3893 NaCl n=6217 NaCl n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

40 Speed-up for sequential ChFSI Au 98 Ag 10 with increasing G max = 3.0, Speed up vs. Iteration index for Nev=972 and 2 distinct matrix sizes AuAg n=5638 AuAg n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

41 Sequential to Parallel: what and how? Fraction of computing time for sequential ChFSI extracted from AuAg system (n=8970, nev=972) at iteration 23. Lanczos < 1% 4% Residuals convergence 6% Rayleigh Ritz Chebyshev filter 90% Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

42 Sequential to Parallel: what and how? Speed-up of multi-threaded BLAS ChFSI vs sequential ChFSI over 8 cores when fed approx. eigenvectors. Speed up of Multi threaded ChFSI over Sequential ChFSI w.r.t. Iteration index AuAg n=8970 NaCl n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

43 Sequential to Parallel: what and how? Speed-up of sequential ChFSI and multi-threaded BLAS ChFSI Speed up of approx. vs. random vectors w.r.t. Iteration index AuAg n= cores AuAg n= core 3.5 Speed up Iteration Index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

44 Efficiency Ratio of performance respect to theoretical peak performance Table: Efficiency for Na 15 Cl 14 Li (n=9273) Self-consistent cycle Full Algorithm Chebyshev filter % 91.4% % 90.9% % 90.4% % 90.8% % 91.5% % 91.6% % 91.6% % 92.1% % 90.8% % 91.4% % 90.9% % 90.8% Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

45 Recapping 1 Approximate vs. Random: sequential ChFSI achieves speed-ups in the range 1.5X 3.5X; multi-threaded version of ChFSI achieve speed-ups up to 5X; 2 Multi-threaded ChFSI: by just using the multi-threaded version of BLAS, ChFSI achieve speed-ups above 7X; the larger the size of the eigenproblem the better ChFSI performs. 3 Performance: Extensive use of BLAS library in the filter facilitates close to optimal performance. UNANSWERED QUESTIONS: Can we do better on shared memory architectures? Can we be faster than the direct methods? Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

46 Recapping 1 Approximate vs. Random: sequential ChFSI achieves speed-ups in the range 1.5X 3.5X; multi-threaded version of ChFSI achieve speed-ups up to 5X; 2 Multi-threaded ChFSI: by just using the multi-threaded version of BLAS, ChFSI achieve speed-ups above 7X; the larger the size of the eigenproblem the better ChFSI performs. 3 Performance: Extensive use of BLAS library in the filter facilitates close to optimal performance. UNANSWERED QUESTIONS: Can we do better on shared memory architectures? YES OpenMP. Can we be faster than the direct methods? depends on the percentage of the eigenspectrum sought after and the iteration index. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

47 Parallelizing the filter with OpenMP Input : Hamiltonian H vectors to be filtered Z DEG, c, e. Output: Filtered vectors W. 1: for i = 1 to DEG do 2: Compute α. Compute β. 3: W = αhz + βw. 4: SWAP(Z, W). 5: end for 6: SWAP(Z, W). Hamiltonian. Vectors Filtered to be = filtered. vectors Figure: Matrix matrix multiplication scheme. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

48 OpenMP ChFSI vs multi-threaded ChFSI Time to completion for AuAg (n=13379) of OpenMP vs. multi threaded BLAS ChFSI multi threaded BLAS ChFSI OpenMP 1500 CPU time (seconds) Iteration Index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

49 Scalability OpenMP ChFSI obtains almost ideal strong scalability: the larger the better. 8 BChFSI Strong scalability 7 6 AuAg n=8970 NaCl n=6217 CaFeAs n=2612 Ideal speed up Speed up Number of threads Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

50 OpenMP ChFSI vs Direct eigensolvers on shared memory architectures Comparison with multi-threaded LAPACK MR 3 and MKL multi-threaded LAPACK MR Time for LAPACK + m t BLAS and OMPChFSI for AuAg (n=13379) solving for lowest 7.3% of eigespectrum CPU time (seconds) LAPACK + multi thrd BLAS MKL LAPACK + multi thrd BLAS ChFSI OpenMP Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

51 OpenMP ChFSI vs Direct eigensolvers on shared memory architectures Comparison with multi-threaded LAPACK MR 3 and MKL multi-threaded LAPACK MR Time for LAPACK + m t BLAS and OMPChFSI for NaCl (n=9273) 400 CPU time (seconds) LAPACK + multi thrd BLAS MKL LAPACK + multi thrd BLAS OpenMP ChFSI 100 solving for lowest 2.8% of eigenspectrum Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

52 Conclusions and future work CONCLUSIONS Feeding eigenvectors of P (l 1) to a block iterative solver like ChFSI speed-ups the iterative solver for P (l) ; The improved use of computing resources together with strong scalability will enable to access larger physics; The algorithmic structure of FLAPW needed to be re-thought: reverse simulation can substantially influence the computational paradigm of an application; ONGOING AND FUTURE WORK ChFSI can be extended and improved; 1 Finalizing filter optimization by adjusting the degree of the polynomial so to just achieve the required eigenvector residuals. 2 Ongoing work to parallelize ChFSI for distributed memory architectures = Elemental; 3 Parallelization of ChFSI for GPUs by using OpenACC directives; Analysis on the structure of the entries of A and B across adjacent iteration seems to suggest an exploitation of low-rank updates for sequences of eigenproblems = Hierarchical matrices; Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

53 References 1 M. Berljafa, and EDN A parallel Chebyshev Filtered Subspace Iteration optimized for LAPW-based methods In preparation 2 EDN, and M. Berljafa Block iterative eigensolvers for sequences of correlated eigenvalue problems Submitted to Comp. Phys. Comm. [arxiv: ]. 3 EDN, P. Bientinesi, and S. Blügel, Correlation in sequences of generalized eigenproblems arising in Density Functional Theory, Comp. Phys. Comm. 183 (2012), pp , [arxiv: ]. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems

An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems Mario Berljafa Daniel Wortmann Edoardo Di Napoli arxiv:1404.4161v2 [cs.ms] 6 Jul 2014 July 8, 2014 Abstract In many scientific

More information

Quantum Theory of Materials

Quantum Theory of Materials Quantum Theory of Materials Introduction to Density Functional Theory and its Computational Challenges Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum

More information

FEAST eigenvalue algorithm and solver: review and perspectives

FEAST eigenvalue algorithm and solver: review and perspectives FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012

More information

All-electron density functional theory on Intel MIC: Elk

All-electron density functional theory on Intel MIC: Elk All-electron density functional theory on Intel MIC: Elk W. Scott Thornton, R.J. Harrison Abstract We present the results of the porting of the full potential linear augmented plane-wave solver, Elk [1],

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set Chapter 3 The (L)APW+lo Method 3.1 Choosing A Basis Set The Kohn-Sham equations (Eq. (2.17)) provide a formulation of how to practically find a solution to the Hohenberg-Kohn functional (Eq. (2.15)). Nevertheless

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

A Green Function Method for Large Scale Electronic Structure Calculations. Rudolf Zeller

A Green Function Method for Large Scale Electronic Structure Calculations. Rudolf Zeller A Green Function Method for Large Scale Electronic Structure Calculations Rudolf Zeller Institute for Advanced Simulation, Forschungszentrum Jülich Electronic structure calculations (density functional

More information

GPU accelerated Arnoldi solver for small batched matrix

GPU accelerated Arnoldi solver for small batched matrix 15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005 1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r

More information

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES Eigenvalue Problems CHAPTER 1 : PRELIMINARIES Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Mathematics TUHH Heinrich Voss Preliminaries Eigenvalue problems 2012 1 / 14

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

ECS130 Scientific Computing Handout E February 13, 2017

ECS130 Scientific Computing Handout E February 13, 2017 ECS130 Scientific Computing Handout E February 13, 2017 1. The Power Method (a) Pseudocode: Power Iteration Given an initial vector u 0, t i+1 = Au i u i+1 = t i+1 / t i+1 2 (approximate eigenvector) θ

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

arxiv: v1 [cs.ce] 31 Oct 2016

arxiv: v1 [cs.ce] 31 Oct 2016 Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods arxiv:1611.00606v1 [cs.ce] 31 Oct 2016 Diego Fabregat-Traver, Davor Davidović, Markus Höhnerbach, and Edoardo di Napoli

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Solid State Theory: Band Structure Methods

Solid State Theory: Band Structure Methods Solid State Theory: Band Structure Methods Lilia Boeri Wed., 11:15-12:45 HS P3 (PH02112) http://itp.tugraz.at/lv/boeri/ele/ Plan of the Lecture: DFT1+2: Hohenberg-Kohn Theorem and Kohn and Sham equations.

More information

Is there life after the Lanczos method? What is LOBPCG?

Is there life after the Lanczos method? What is LOBPCG? 1 Is there life after the Lanczos method? What is LOBPCG? Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM ALA Meeting, July 17,

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

Density Functional Theory

Density Functional Theory Density Functional Theory March 26, 2009 ? DENSITY FUNCTIONAL THEORY is a method to successfully describe the behavior of atomic and molecular systems and is used for instance for: structural prediction

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Applications of Matrix Functions Part III: Quantum Chemistry

Applications of Matrix Functions Part III: Quantum Chemistry Applications of Matrix Functions Part III: Quantum Chemistry Emory University Department of Mathematics and Computer Science Atlanta, GA 30322, USA Prologue The main purpose of this lecture is to present

More information

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture

More information

Polynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Polynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota Polynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with: Haw-ren

More information

A Non-Linear Eigensolver-Based Alternative to Traditional Self-Consistent Electronic Structure Calculation Methods

A Non-Linear Eigensolver-Based Alternative to Traditional Self-Consistent Electronic Structure Calculation Methods University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2013 A Non-Linear Eigensolver-Based Alternative to Traditional Self-Consistent Electronic Structure Calculation

More information

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator

A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator Artur Strebel Bergische Universität Wuppertal August 3, 2016 Joint Work This project is joint work with: Gunnar

More information

ILAS 2017 July 25, 2017

ILAS 2017 July 25, 2017 Polynomial and rational filtering for eigenvalue problems and the EVSL project Yousef Saad Department of Computer Science and Engineering University of Minnesota ILAS 217 July 25, 217 Large eigenvalue

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems

Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Zhaojun Bai University of California, Davis Berkeley, August 19, 2016 Symmetric definite generalized eigenvalue problem

More information

Density Functional Theory

Density Functional Theory Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using

More information

Behind the "exciting" curtain: The (L)APW+lo method

Behind the exciting curtain: The (L)APW+lo method Behind the "exciting" curtain: The (L)APW+lo method Aug 7, 2016 Andris Gulans Humboldt-Universität zu Berlin Kohn-Sham equation Potential due to nuclei Exchange-correlation potential Potential due to electron

More information

Solving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements

Solving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements Solving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements Institute of Physics, Academy of Sciences of the Czech Republic June 21, 2008 Introduction Contens Density Functional

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,

More information

References. Documentation Manuals Tutorials Publications

References.   Documentation Manuals Tutorials Publications References http://siesta.icmab.es Documentation Manuals Tutorials Publications Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang atomic energy unit = 1 Hartree =

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Applications of Renormalization Group Methods in Nuclear Physics 2

Applications of Renormalization Group Methods in Nuclear Physics 2 Applications of Renormalization Group Methods in Nuclear Physics 2 Dick Furnstahl Department of Physics Ohio State University HUGS 2014 Outline: Lecture 2 Lecture 2: SRG in practice Recap from lecture

More information

DFT / SIESTA algorithms

DFT / SIESTA algorithms DFT / SIESTA algorithms Javier Junquera José M. Soler References http://siesta.icmab.es Documentation Tutorials Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems AMSC 663-664 Final Report Minghao Wu AMSC Program mwu@math.umd.edu Dr. Howard Elman Department of Computer

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

Density Functional Theory. Martin Lüders Daresbury Laboratory

Density Functional Theory. Martin Lüders Daresbury Laboratory Density Functional Theory Martin Lüders Daresbury Laboratory Ab initio Calculations Hamiltonian: (without external fields, non-relativistic) impossible to solve exactly!! Electrons Nuclei Electron-Nuclei

More information

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems Mario Arioli m.arioli@rl.ac.uk CCLRC-Rutherford Appleton Laboratory with Daniel Ruiz (E.N.S.E.E.I.H.T)

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota

Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota PMAA 14 Lugano, July 4, 2014 Collaborators: Joint work with:

More information

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm

Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses 2018 Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm Brendan E. Gavin University

More information

Electronic Structure Methodology 1

Electronic Structure Methodology 1 Electronic Structure Methodology 1 Chris J. Pickard Lecture Two Working with Density Functional Theory In the last lecture we learnt how to write the total energy as a functional of the density n(r): E

More information

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev,

More information

Rational Krylov methods for linear and nonlinear eigenvalue problems

Rational Krylov methods for linear and nonlinear eigenvalue problems Rational Krylov methods for linear and nonlinear eigenvalue problems Mele Giampaolo mele@mail.dm.unipi.it University of Pisa 7 March 2014 Outline Arnoldi (and its variants) for linear eigenproblems Rational

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Translation Symmetry, Space Groups, Bloch functions, Fermi energy

Translation Symmetry, Space Groups, Bloch functions, Fermi energy Translation Symmetry, Space Groups, Bloch functions, Fermi energy Roberto Orlando and Silvia Casassa Università degli Studi di Torino July 20, 2015 School Ab initio Modelling of Solids (UniTo) Symmetry

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

The electronic structure of materials 2 - DFT

The electronic structure of materials 2 - DFT Quantum mechanics 2 - Lecture 9 December 19, 2012 1 Density functional theory (DFT) 2 Literature Contents 1 Density functional theory (DFT) 2 Literature Historical background The beginnings: L. de Broglie

More information

CHAPTER 3 WIEN2k. Chapter 3 : WIEN2k 50

CHAPTER 3 WIEN2k. Chapter 3 : WIEN2k 50 CHAPTER 3 WIEN2k WIEN2k is one of the fastest and reliable simulation codes among computational methods. All the computational work presented on lanthanide intermetallic compounds has been performed by

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Electronic Structure of Crystalline Solids

Electronic Structure of Crystalline Solids Electronic Structure of Crystalline Solids Computing the electronic structure of electrons in solid materials (insulators, conductors, semiconductors, superconductors) is in general a very difficult problem

More information

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems University of Massachusetts - Amherst From the SelectedWorks of Eric Polizzi 2009 Density-Matrix-Based Algorithms for Solving Eingenvalue Problems Eric Polizzi, University of Massachusetts - Amherst Available

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory

Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-211/8-1 12/August/211 Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory

More information

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii

More information

Krylov subspace projection methods

Krylov subspace projection methods I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks

More information

arxiv: v1 [hep-lat] 2 May 2012

arxiv: v1 [hep-lat] 2 May 2012 A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations arxiv:1205.0359v1 [hep-lat] 2 May 2012 Fachbereich C, Mathematik und Naturwissenschaften, Bergische Universität

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Orthogonal iteration to QR

Orthogonal iteration to QR Notes for 2016-03-09 Orthogonal iteration to QR The QR iteration is the workhorse for solving the nonsymmetric eigenvalue problem. Unfortunately, while the iteration itself is simple to write, the derivation

More information

Iterative solvers for linear equations

Iterative solvers for linear equations Spectral Graph Theory Lecture 23 Iterative solvers for linear equations Daniel A. Spielman November 26, 2018 23.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e A Parallel Solver for Extreme Eigenpairs 1 Leonardo Borges and Suely Oliveira 2 Computer Science Department, Texas A&M University, College Station, TX 77843-3112, USA. Abstract. In this paper a parallel

More information

Cucheb: A GPU implementation of the filtered Lanczos procedure. Department of Computer Science and Engineering University of Minnesota, Twin Cities

Cucheb: A GPU implementation of the filtered Lanczos procedure. Department of Computer Science and Engineering University of Minnesota, Twin Cities Cucheb: A GPU implementation of the filtered Lanczos procedure Jared L. Aurentz, Vassilis Kalantzis, and Yousef Saad May 2017 EPrint ID: 2017.1 Department of Computer Science and Engineering University

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information