Improving the performance of applied science numerical simulations: an application to Density Functional Theory
|
|
- Natalie Boone
- 6 years ago
- Views:
Transcription
1 Improving the performance of applied science numerical simulations: an application to Density Functional Theory Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum Jülich GmbH Aachen Institute for Advanced Study in Computational Engineering Science RWTH Aachen University Columbia University, March 5th 2013 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
2 Motivation and Goals Two Facts Often algorithmic libraries are used as black boxes. Very little information coming from the physics of the specific application is exploited by scientific computing codes. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
3 Motivation and Goals Two Facts Often algorithmic libraries are used as black boxes. Very little information coming from the physics of the specific application is exploited by scientific computing codes. One Objective Exploiting physical information extracted from the simulations in order to: increase the performance of large legacy codes; improve the computational paradigm on which the codes are based. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
4 Computational science Traditional approach Science-driven workflow Mathematical Model = Algorithmic Structure = Simulations = Physics Math Algorithm Often a literal translation of the mathematical model into a combination of algorithmic tasks usually designed in abstraction from other considerations (kernel optimization, tasks harmonization, etc.); Algorithm Sim The simulation is an end product, the successful implementation of an algorithm into a series of machine accessible operations resulting in the computation of meaningful physical quantities; Math Sim The mathematical model and the simulation are considered as practically disjoint. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
5 Computational science Modern approach Science-driven + high-performance workflow Mathematical Model Algorithmic Structure Simulations = Physics Math Algorithm The mathematics model undergoes substantial reformulation in order to cast operations in terms of optimized algorithmic structures. At the same time concerns regarding the numerical stability and accuracy are carefully taken into consideration and included in the reformulation; Algorithm Sim Simulations can become a tool to uncover hidden aspects of physical phenomena missed by the mathematical model. Information extracted from a careful analysis of the computations can greatly influence the choice of the optimal algorithm and how it should be used; Math Sim Some information extracted from a careful analysis of the simulations can influence so much the algorithmic choice to suggest a complete revolution in the formulation of the mathematical model. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
6 Reverse Simulation Math, Algorithm Sim total energy band energy gap conductivity forces, etc. Simulations = = Mathematical model Algorithmic structure FLAPW Paradigm Mathematical Model Algorithmic Structure Feedback from analysis of correlated eigenproblems {Pi} SIM SIM. SIM SIM SIM SIM Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
7 Sequences of eigenproblems in Density Functional Theory A glimpse of the results Speedup Scalability 2.8 Speed up vs. Iteration index for Nev=256 and three distinct matrix sizes 8 BChFSI Strong scalability NaCl n=3893 NaCl n=6217 NaCl n= AuAg n=8970 NaCl n=6217 CaFeAs n=2612 Ideal speed up 2.2 Speed up Speed up Iteration index Number of threads Table: Efficiency Self-consistent cycle Percentage of peak performance % % % % % % Fast: a great improvement of the execution time Efficient: an extensive use of computing resources Scalable: ideal use of parallel computing architectures Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
8 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
9 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
10 The Foundations Investigative framework Quantum Mechanics and its ingredients n H = 2m h2 2 n Z α i i=1 i=1 α x i a α + 1 i<j x i x j Hamiltonian Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) Wavefunction ( ) n Φ : R 3 {± 1 2 } R high-dimensional anti-symmetric function describes the orbitals of atoms and molecules. In the Born-Oppenheimer approximation, it is the solution of the Electronic Schrödinger Equation H Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) = E Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
11 Density Functional Theory (DFT) Hohenberg-Kohn theorem (1964) one-to-one correspondence n(r) V 0 (r) = V 0 (r) = V 0 (r)[n]! a functional E[n] : E 0 = min n E[n] 1 Φ(x 1 ;s 1,x 2 ;s 2,...,x n ;s n ) = Λ i,a φ a (x i ;s i ) 2 Charge density n(r) = a f a φ a (r) 2 3 In the Schrödinger equation the exact Coulomb interaction is substituted with an effective potential V 0 (r) = V I (r) + V H (r) + V xc (r) The high-dimensional Schrödinger equation translates into a set of coupled non-linear low-dimensional self-consistent Kohn-Sham (KS) equations ( ) a solve Ĥ KS φ a (r) = h2 2m 2 + V 0 (r) φ a (r) = ε a φ a (r) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
12 Kohn-Sham scheme and DFT Self-consistent cycle Typically this set of equations is solved using an iterative self-consistent cycle Initial guess n init (r) = Compute KS potential V 0 (r)[n] Solve KS equations Ĥ KS φ a (r) = ε a φ a (r) No OUTPUT Energy, forces,... Yes = Converged? n (l+1) n (l) < η Compute new density n(r) = a f a φ a (r) 2 In practice this iterative cycle is much more computationally challenging and requires some form of broadly defined discretization. A common way of discretizing the KS equations is to expand the wave functions φ a (r) on a basis set. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
13 Introduction to FLAPW Linearized augmented plane wave (LAPW) in a Muffin Tin (MT) geometry Lapw φ k,ν (r) = c G k,ν ψ G(k,r) G+k G max 1 e i(k+g)r Ω ψ G (k,r) = α,l,m k ν Bloch vector band index Interstitial (I) [ a α,g lm (k)uα l (r) + bα,g lm (k) uα l ]Y (r) lm (ˆr α ) Muffin Tin (MT) boundary conditions Continuity of wavefunction and its derivative at MT boundary a α,g lm (k) and bα,g lm (k) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
14 Introduction to FLAPW Linearized augmented plane wave (LAPW) in a Muffin Tin (MT) geometry... cont. The radial functions u α l (r) are the solutions of the atomic Schrödinger equations where the potential retains only the spherical part. [ Ĥsph α uα l (r) = h2 2 2m r 2 + h2 2m l(l + 1) r 2 + Vsph α (r) ] r u α l (r) = E lu α l (r) where the u can be determined from the energy derivative of the Schrödinger-like equation Ĥ α sph uα l = E l u α l + uα l E l is a parameter and it is predetermined by optimization like the plane wave case there is a cut-off G max (typically ) unlike the plane waves there is also a cut-off (for open-shell atoms) on l max (typically 8) unlike the plane waves this set of basis functions is overcomplete it is NOT an orthogonal set a new set of basis functions is computed at each self-consistent cycle Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
15 Introduction to FLAPW Generalized eigenvalue problems The expansion of is φ k,ν (r) in terms of LAPW basis set is then inserted in the KS equations ψ G (k,r) Ĥ KS c G k,ν ψ G (k,r) = λ kν ψ G (k,r) c G k,ν ψ G (k,r), G G and, by defining the matrix entries for the left and right hand side respectively as Hamiltonian A k and overlap matrices B k, [A k B k ] GG = dr ψ G (k,r)[ Ĥ KS ˆ1 ] ψ G (k,r) α one arrives at generalized eigenvalue equations parametrized by k P k : (A k ) GG c G kν = λ kν (B k ) GG c G kν G G or, in compact form, A k x i = λ i B k x i. with x i. = c G kν. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
16 Discretized Kohn-Sham scheme Self-consistent cycle Initial guess n start (r) = Compute KS potential V 0 (r)[n] Solve a set of eigenproblems P k1...p kn No OUTPUT Energy,... Yes = Converged? n (l+1) n (l) < η Compute new density n(r) = k,ν f k,ν φ k,ν (r) 2 Observations: 1 every P k : Ax = Bλx is a generalized eigenvalue problem; 2 A and B are DENSE and hermitian (B is also pos. def.); 3 P k s with different k index have different size and are independent from each other. 4 k= 1: ; i = 1:20-50 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
17 A (mostly) converging process The traditional approach Initialization: At every iteration of the cycle all numerical quantities are entirely re-computed A new set of basis functions is assembled at each at iteration cycle (Math Sim) The entries of the matrices A and B are re-initialized at each iteration cycle (Algorithm Sim) Eigenproblems: Each eigenproblem P (l) : A (l) x = λb (l) x at iteration l is solved in total independence from the eigenproblem P (l 1) of the previous iteration (Algorithm Sim) Convergence: Starting with a electron density close enough to the one minimizing the energy E 0 it is likely to reach convergence within few tens of iterations. Unfortunately there is no general theorem establishing the converging conditions # of steps is still uncertain. They depend on the material and the initial guess (area of convergence) (Math Algorithm) n (r) n(r) undergo relatively small decreasing oscillations (Math Algorithm & Sim) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
18 A (mostly) converging process The traditional approach Initialization: At every iteration of the cycle all numerical quantities are entirely re-computed A new set of basis functions is assembled at each at iteration cycle (Math Sim) The entries of the matrices A and B are re-initialized at each iteration cycle (Algorithm Sim) Eigenproblems: Each eigenproblem P (l) : A (l) x = λb (l) x at iteration l is solved in total independence from the eigenproblem P (l 1) of the previous iteration (Algorithm Sim) Convergence: Starting with a electron density close enough to the one minimizing the energy E 0 it is likely to reach convergence within few tens of iterations. Unfortunately there is no general theorem establishing the converging conditions # of steps is still uncertain. They depend on the material and the initial guess (area of convergence) (Math Algorithm) n (r) n(r) undergo relatively small decreasing oscillations (Math Algorithm & Sim) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
19 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
20 FLAPW self-consistent cycle Computational costs Spherical V 0(r) n 0(r) init. Pseudo-charge method Calculation of full-potential V (r)[n] 40% time usage Convergence check n(r) mixing n(r) for next cycle Determining the Lapw wavefunctions ψ G(k, r) 5 10% time usage Calculation of new charge density n(r) a G lm and bg lm coeff. 3-D stars, 2-D stars lattice harmonics Matrices generation A k =< ψ(k) H ψ (k) > B k =< ψ(k) S ψ (k) > <1% time usage Fermi energy calculation Eigenpairs selection Generalized eigenproblems solution A kx = λb kx Eigenproblems: distribution and setup Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
21 Sequences of eigs: an ALGORITHM SIM case Sequences of eigenproblems Consider the set of generalized eigenproblems P (1)...P (l) P (l+1)...p (N) not as a set of disjoint problems (P) N, but as a sequence; { Could this sequence P (l)} of eigenproblems evolve following a convergence pattern in line with the convergence of n(r)? REVERSE SIMULATION method numerical simulations analyzed employing a parameter-based inverse problem method; collected data on deviation angles b/w eigenvectors of adjacent eigenproblems; identified evolutions of eigenvectors along the sequence. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
22 Angle evolution fixed k Example: a metallic compound at fixed k of subspace angle for eigenvectors of k point 1 and lowest 75 eigs 0Evolution 10 AuAg Angle b/w eigenvectors of adjacent iterations Iterations (2 > 22) Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
23 Correlation and its exploitation correlation between successive eigenvectors x (l 1) and x (l) Angles decrease monotonically with some oscillation Majority of angles are small after the first few iterations Note: Mathematical model Correlation. Correlation numerical analysis of the simulation. ALGORITHM SIM The stage is favorable to an iterative eigensolver where the eigenvectors of P (l 1) are fed to the solve P (l). Next stages of the investigation: 1 Development of a block iterative eigensolver that can exploit the correlation 2 Investigate if approximate eigenvectors can speed-up the iterative solver of choice 3 Understand if such an iterative method be competitive with direct methods for dense problems Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
24 Algorithmic digression Direct solvers. Iterative solvers. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
25 Algorithmic digression Direct solvers. Iterative solvers. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
26 Algorithmic digression Direct solvers. Iterative solvers. λ 1 > λ 2 > λ 3 >... Ax j = λ j x j v = j γ j x j [ ] Av = j λ j γ j x j A k v = j λ k j γ λ jx j = λ 1 x 1 + j j 2 x λ 1 j λ j Rate of convergence magnitude of λ 1 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
27 Algorithmic digression Direct solvers. Iterative solvers. Dense matrices. Sparse matrices. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
28 Algorithmic digression Direct solvers. Iterative solvers. Dense matrices. Sparse matrices. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
29 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
30 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
31 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: 1 The ability to receive as input a sizable set Z 0 of approximate eigenvectors; 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
32 Chebyshev Filtered Sub-space Iteration method (ChFSI) Two are the essential properties our iterative algorithm has to comply with: 1 The ability to receive as input a sizable set Z 0 of approximate eigenvectors; 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. ChFSI constitutes the natural choice: it accepts the full set of multiple starting vectors; it avoids stalling when facing small clusters of eigenvalues; when augmented with polynomial accelerators it has a much faster convergence rate; converged eigenvectors can be easily locked. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
33 ChFSI pseudocode INPUT: Hamiltonian, approximate eigenpairs (Λ,Z 0 ), TOL, DEG. OUTPUT: Wanted eigenpairs W. 1 Lanczos step. Identify the bounds for the interval to be filtered out. REPEAT UNTIL CONVERGENCE: 2 Chebyshev filter. Filter a block of vectors W Z 0. 3 QR decomposition. Re-orthogonalize the vectors outputted by the filter; W = QR. 4 Compute the Rayleigh quotient G = Q HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors. END REPEAT Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
34 The core of the algorithm: Chebyshev filter Chebyshev polynomials The Chebyshev polynomial C m of the first kind of order m, is defined as { cos(marccos(x)), x [ 1,1]; C m (x) = cosh(marccosh(x)), x > 1. Three-terms recurrence relation C m+1 (x) = 2xC m (x) C m 1 (x); m N, C 0 (x) = 1, C 1 (x) = x Degree 5 6 Degree 10 x Degree 15 x Degree 20 x Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
35 The core of the algorithm: Chebyshev filter The basic principle Theorem Let γ > 1 and P m denote the set of polynomials of degree smaller or equal to m. Then the extremum min max p(t) p P m,p(γ)=1 t [ 1,1] is reached by p m (t). = C m(t) C m (γ). A generic vector v is very quickly aligned in the direction of the eigenvector corresponding to the extremal eigenvalue λ 1 v m n n = p m (A)v = s i p m (A)x i = s i p m (λ i )x i i=1 i=1 n C m ( = s 1 x 1 + λ i c e ) s i i=2 C m ( λ 1 c e ) x i s 1 x 1 Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
36 The core of the algorithm: Chebyshev filter The pseudocode A simple linear transformation maps [ 1,1] [α,β] R c = β + α 2 center of the interval e = β α 2 width of the interval. INPUT: Hamiltonian H - approx. eigenvectors Z 0 lowest eigenv. λ 1 - parameters for interval c, e - DEG. OUTPUT: Filtered vectors W. 1 σ 1 e/(λ 1 c) 2 Z 1 σ 1 e (H ci n)z 0 FOR: i = 1 DEG σ i+1 (2/σ 1 σ i ) 4 Z i+1 2 σ i+1 e (H ci n)z i σ i+1 σ i Z i 1 END FOR. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
37 Outline 1 Stating the problem: how sequences of generalized eigenproblems arise in all-electron computations 2 Algorithmic choice = Eigenvectors angle evolution 3 Tailoring the algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) 4 Physics-driven performance boost: numerical results Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
38 Experimental tests setup Solving with a ChFSI version implemented in C Approx. vs Random vectors fed to ChFSI against Iteration Index; Parallel ChFSI vs Direct methods against Iteration Index. Matrix sizes: 2,600 13,300. B is in general almost singular (ill-conditioned). Examples: size(a) = 50 κ(a) 10 4 ; size(a) = 500 κ(a) 10 7 We used the standard form for the problem Ax = λbx A y = λy with A = L 1 AL T and y = L T x Tests were performed on JUROPA using 1 node with 8 cores. 2 Intel Xeon 5570 (Nehalem-EP) quad-core processors, 2.93GHz; 24 GB/node; THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS; Minimum absolute tolerance of residuals r x = ; All numerical data are MEDIAN values over 12 distinct measurements. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
39 Speed-up for sequential ChFSI Na 15 Cl 14 Li with increasing G max = 3.0,3.5, Speed up vs. Iteration index for Nev=256 and three distinct matrix sizes NaCl n=3893 NaCl n=6217 NaCl n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
40 Speed-up for sequential ChFSI Au 98 Ag 10 with increasing G max = 3.0, Speed up vs. Iteration index for Nev=972 and 2 distinct matrix sizes AuAg n=5638 AuAg n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
41 Sequential to Parallel: what and how? Fraction of computing time for sequential ChFSI extracted from AuAg system (n=8970, nev=972) at iteration 23. Lanczos < 1% 4% Residuals convergence 6% Rayleigh Ritz Chebyshev filter 90% Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
42 Sequential to Parallel: what and how? Speed-up of multi-threaded BLAS ChFSI vs sequential ChFSI over 8 cores when fed approx. eigenvectors. Speed up of Multi threaded ChFSI over Sequential ChFSI w.r.t. Iteration index AuAg n=8970 NaCl n= Speed up Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
43 Sequential to Parallel: what and how? Speed-up of sequential ChFSI and multi-threaded BLAS ChFSI Speed up of approx. vs. random vectors w.r.t. Iteration index AuAg n= cores AuAg n= core 3.5 Speed up Iteration Index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
44 Efficiency Ratio of performance respect to theoretical peak performance Table: Efficiency for Na 15 Cl 14 Li (n=9273) Self-consistent cycle Full Algorithm Chebyshev filter % 91.4% % 90.9% % 90.4% % 90.8% % 91.5% % 91.6% % 91.6% % 92.1% % 90.8% % 91.4% % 90.9% % 90.8% Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
45 Recapping 1 Approximate vs. Random: sequential ChFSI achieves speed-ups in the range 1.5X 3.5X; multi-threaded version of ChFSI achieve speed-ups up to 5X; 2 Multi-threaded ChFSI: by just using the multi-threaded version of BLAS, ChFSI achieve speed-ups above 7X; the larger the size of the eigenproblem the better ChFSI performs. 3 Performance: Extensive use of BLAS library in the filter facilitates close to optimal performance. UNANSWERED QUESTIONS: Can we do better on shared memory architectures? Can we be faster than the direct methods? Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
46 Recapping 1 Approximate vs. Random: sequential ChFSI achieves speed-ups in the range 1.5X 3.5X; multi-threaded version of ChFSI achieve speed-ups up to 5X; 2 Multi-threaded ChFSI: by just using the multi-threaded version of BLAS, ChFSI achieve speed-ups above 7X; the larger the size of the eigenproblem the better ChFSI performs. 3 Performance: Extensive use of BLAS library in the filter facilitates close to optimal performance. UNANSWERED QUESTIONS: Can we do better on shared memory architectures? YES OpenMP. Can we be faster than the direct methods? depends on the percentage of the eigenspectrum sought after and the iteration index. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
47 Parallelizing the filter with OpenMP Input : Hamiltonian H vectors to be filtered Z DEG, c, e. Output: Filtered vectors W. 1: for i = 1 to DEG do 2: Compute α. Compute β. 3: W = αhz + βw. 4: SWAP(Z, W). 5: end for 6: SWAP(Z, W). Hamiltonian. Vectors Filtered to be = filtered. vectors Figure: Matrix matrix multiplication scheme. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
48 OpenMP ChFSI vs multi-threaded ChFSI Time to completion for AuAg (n=13379) of OpenMP vs. multi threaded BLAS ChFSI multi threaded BLAS ChFSI OpenMP 1500 CPU time (seconds) Iteration Index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
49 Scalability OpenMP ChFSI obtains almost ideal strong scalability: the larger the better. 8 BChFSI Strong scalability 7 6 AuAg n=8970 NaCl n=6217 CaFeAs n=2612 Ideal speed up Speed up Number of threads Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
50 OpenMP ChFSI vs Direct eigensolvers on shared memory architectures Comparison with multi-threaded LAPACK MR 3 and MKL multi-threaded LAPACK MR Time for LAPACK + m t BLAS and OMPChFSI for AuAg (n=13379) solving for lowest 7.3% of eigespectrum CPU time (seconds) LAPACK + multi thrd BLAS MKL LAPACK + multi thrd BLAS ChFSI OpenMP Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
51 OpenMP ChFSI vs Direct eigensolvers on shared memory architectures Comparison with multi-threaded LAPACK MR 3 and MKL multi-threaded LAPACK MR Time for LAPACK + m t BLAS and OMPChFSI for NaCl (n=9273) 400 CPU time (seconds) LAPACK + multi thrd BLAS MKL LAPACK + multi thrd BLAS OpenMP ChFSI 100 solving for lowest 2.8% of eigenspectrum Iteration index Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
52 Conclusions and future work CONCLUSIONS Feeding eigenvectors of P (l 1) to a block iterative solver like ChFSI speed-ups the iterative solver for P (l) ; The improved use of computing resources together with strong scalability will enable to access larger physics; The algorithmic structure of FLAPW needed to be re-thought: reverse simulation can substantially influence the computational paradigm of an application; ONGOING AND FUTURE WORK ChFSI can be extended and improved; 1 Finalizing filter optimization by adjusting the degree of the polynomial so to just achieve the required eigenvector residuals. 2 Ongoing work to parallelize ChFSI for distributed memory architectures = Elemental; 3 Parallelization of ChFSI for GPUs by using OpenACC directives; Analysis on the structure of the entries of A and B across adjacent iteration seems to suggest an exploitation of low-rank updates for sequences of eigenproblems = Hierarchical matrices; Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
53 References 1 M. Berljafa, and EDN A parallel Chebyshev Filtered Subspace Iteration optimized for LAPW-based methods In preparation 2 EDN, and M. Berljafa Block iterative eigensolvers for sequences of correlated eigenvalue problems Submitted to Comp. Phys. Comm. [arxiv: ]. 3 EDN, P. Bientinesi, and S. Blügel, Correlation in sequences of generalized eigenproblems arising in Density Functional Theory, Comp. Phys. Comm. 183 (2012), pp , [arxiv: ]. Edoardo Di Napoli (AICES/JSC) An example of reverse simulation Columbia University, March 5th / 40
Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems
Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals
More informationMitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability
Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev
More informationA knowledge-based approach to high-performance computing in ab initio simulations.
Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background
More informationAn Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems
An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems Mario Berljafa Daniel Wortmann Edoardo Di Napoli arxiv:1404.4161v2 [cs.ms] 6 Jul 2014 July 8, 2014 Abstract In many scientific
More informationQuantum Theory of Materials
Quantum Theory of Materials Introduction to Density Functional Theory and its Computational Challenges Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum
More informationFEAST eigenvalue algorithm and solver: review and perspectives
FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012
More informationAll-electron density functional theory on Intel MIC: Elk
All-electron density functional theory on Intel MIC: Elk W. Scott Thornton, R.J. Harrison Abstract We present the results of the porting of the full potential linear augmented plane-wave solver, Elk [1],
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationChapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set
Chapter 3 The (L)APW+lo Method 3.1 Choosing A Basis Set The Kohn-Sham equations (Eq. (2.17)) provide a formulation of how to practically find a solution to the Hohenberg-Kohn functional (Eq. (2.15)). Nevertheless
More informationThe Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
More informationEfficient implementation of the overlap operator on multi-gpus
Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationA Green Function Method for Large Scale Electronic Structure Calculations. Rudolf Zeller
A Green Function Method for Large Scale Electronic Structure Calculations Rudolf Zeller Institute for Advanced Simulation, Forschungszentrum Jülich Electronic structure calculations (density functional
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationSolution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method
Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts
More informationPreconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005
1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder
More informationArnoldi Methods in SLEPc
Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,
More informationVASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria
VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r
More informationPFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers
PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationSolution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method
Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts
More informationEigenvalue Problems CHAPTER 1 : PRELIMINARIES
Eigenvalue Problems CHAPTER 1 : PRELIMINARIES Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Mathematics TUHH Heinrich Voss Preliminaries Eigenvalue problems 2012 1 / 14
More informationab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationECS130 Scientific Computing Handout E February 13, 2017
ECS130 Scientific Computing Handout E February 13, 2017 1. The Power Method (a) Pseudocode: Power Iteration Given an initial vector u 0, t i+1 = Au i u i+1 = t i+1 / t i+1 2 (approximate eigenvector) θ
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationarxiv: v1 [cs.ce] 31 Oct 2016
Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods arxiv:1611.00606v1 [cs.ce] 31 Oct 2016 Diego Fabregat-Traver, Davor Davidović, Markus Höhnerbach, and Edoardo di Napoli
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationDUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee
J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM
More informationELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationSolid State Theory: Band Structure Methods
Solid State Theory: Band Structure Methods Lilia Boeri Wed., 11:15-12:45 HS P3 (PH02112) http://itp.tugraz.at/lv/boeri/ele/ Plan of the Lecture: DFT1+2: Hohenberg-Kohn Theorem and Kohn and Sham equations.
More informationIs there life after the Lanczos method? What is LOBPCG?
1 Is there life after the Lanczos method? What is LOBPCG? Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM ALA Meeting, July 17,
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationDensity Functional Theory
Density Functional Theory March 26, 2009 ? DENSITY FUNCTIONAL THEORY is a method to successfully describe the behavior of atomic and molecular systems and is used for instance for: structural prediction
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationApplications of Matrix Functions Part III: Quantum Chemistry
Applications of Matrix Functions Part III: Quantum Chemistry Emory University Department of Mathematics and Computer Science Atlanta, GA 30322, USA Prologue The main purpose of this lecture is to present
More informationELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS
FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc
More informationLast Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection
Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture
More informationPolynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota
Polynomial filtering for interior eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with: Haw-ren
More informationA Non-Linear Eigensolver-Based Alternative to Traditional Self-Consistent Electronic Structure Calculation Methods
University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2013 A Non-Linear Eigensolver-Based Alternative to Traditional Self-Consistent Electronic Structure Calculation
More informationEIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems
EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues
More informationLARGE SPARSE EIGENVALUE PROBLEMS
LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems
More informationA Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator
A Jacobi Davidson Method with a Multigrid Solver for the Hermitian Wilson-Dirac Operator Artur Strebel Bergische Universität Wuppertal August 3, 2016 Joint Work This project is joint work with: Gunnar
More informationILAS 2017 July 25, 2017
Polynomial and rational filtering for eigenvalue problems and the EVSL project Yousef Saad Department of Computer Science and Engineering University of Minnesota ILAS 217 July 25, 217 Large eigenvalue
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationOpportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem
Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter
More informationSolving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems
Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Zhaojun Bai University of California, Davis Berkeley, August 19, 2016 Symmetric definite generalized eigenvalue problem
More informationDensity Functional Theory
Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using
More informationBehind the "exciting" curtain: The (L)APW+lo method
Behind the "exciting" curtain: The (L)APW+lo method Aug 7, 2016 Andris Gulans Humboldt-Universität zu Berlin Kohn-Sham equation Potential due to nuclei Exchange-correlation potential Potential due to electron
More informationSolving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements
Solving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements Institute of Physics, Academy of Sciences of the Czech Republic June 21, 2008 Introduction Contens Density Functional
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationParallelization of the Molecular Orbital Program MOS-F
Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationKarhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques
Institut für Numerische Mathematik und Optimierung Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques Oliver Ernst Computational Methods with Applications Harrachov, CR,
More informationReferences. Documentation Manuals Tutorials Publications
References http://siesta.icmab.es Documentation Manuals Tutorials Publications Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang atomic energy unit = 1 Hartree =
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationApplications of Renormalization Group Methods in Nuclear Physics 2
Applications of Renormalization Group Methods in Nuclear Physics 2 Dick Furnstahl Department of Physics Ohio State University HUGS 2014 Outline: Lecture 2 Lecture 2: SRG in practice Recap from lecture
More informationDFT / SIESTA algorithms
DFT / SIESTA algorithms Javier Junquera José M. Soler References http://siesta.icmab.es Documentation Tutorials Atomic units e = m e = =1 atomic mass unit = m e atomic length unit = 1 Bohr = 0.5292 Ang
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationFinding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems
Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems AMSC 663-664 Final Report Minghao Wu AMSC Program mwu@math.umd.edu Dr. Howard Elman Department of Computer
More informationLARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems
LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems
More informationParallel Eigensolver Performance on High Performance Computers
Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization
More informationDensity Functional Theory. Martin Lüders Daresbury Laboratory
Density Functional Theory Martin Lüders Daresbury Laboratory Ab initio Calculations Hamiltonian: (without external fields, non-relativistic) impossible to solve exactly!! Electrons Nuclei Electron-Nuclei
More informationA Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems
A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems Mario Arioli m.arioli@rl.ac.uk CCLRC-Rutherford Appleton Laboratory with Daniel Ruiz (E.N.S.E.E.I.H.T)
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationComputation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.
Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationParallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems
Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.
More informationSolving Ax = b, an overview. Program
Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview
More informationDivide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota
Divide and conquer algorithms for large eigenvalue problems Yousef Saad Department of Computer Science and Engineering University of Minnesota PMAA 14 Lugano, July 4, 2014 Collaborators: Joint work with:
More informationInexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm
University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses 2018 Inexact and Nonlinear Extensions of the FEAST Eigenvalue Algorithm Brendan E. Gavin University
More informationElectronic Structure Methodology 1
Electronic Structure Methodology 1 Chris J. Pickard Lecture Two Working with Density Functional Theory In the last lecture we learnt how to write the total energy as a functional of the density n(r): E
More informationRecent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev,
More informationRational Krylov methods for linear and nonlinear eigenvalue problems
Rational Krylov methods for linear and nonlinear eigenvalue problems Mele Giampaolo mele@mail.dm.unipi.it University of Pisa 7 March 2014 Outline Arnoldi (and its variants) for linear eigenproblems Rational
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug
More informationA hybrid reordered Arnoldi method to accelerate PageRank computations
A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationTranslation Symmetry, Space Groups, Bloch functions, Fermi energy
Translation Symmetry, Space Groups, Bloch functions, Fermi energy Roberto Orlando and Silvia Casassa Università degli Studi di Torino July 20, 2015 School Ab initio Modelling of Solids (UniTo) Symmetry
More informationNotes on PCG for Sparse Linear Systems
Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/
More informationThe electronic structure of materials 2 - DFT
Quantum mechanics 2 - Lecture 9 December 19, 2012 1 Density functional theory (DFT) 2 Literature Contents 1 Density functional theory (DFT) 2 Literature Historical background The beginnings: L. de Broglie
More informationCHAPTER 3 WIEN2k. Chapter 3 : WIEN2k 50
CHAPTER 3 WIEN2k WIEN2k is one of the fastest and reliable simulation codes among computational methods. All the computational work presented on lanthanide intermetallic compounds has been performed by
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationElectronic Structure of Crystalline Solids
Electronic Structure of Crystalline Solids Computing the electronic structure of electrons in solid materials (insulators, conductors, semiconductors, superconductors) is in general a very difficult problem
More informationDensity-Matrix-Based Algorithms for Solving Eingenvalue Problems
University of Massachusetts - Amherst From the SelectedWorks of Eric Polizzi 2009 Density-Matrix-Based Algorithms for Solving Eingenvalue Problems Eric Polizzi, University of Massachusetts - Amherst Available
More informationKrylov Subspace Methods to Calculate PageRank
Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationCorrelations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-211/8-1 12/August/211 Correlations in Sequences of Generalized Eigenproblems Arising in Density Functional Theory
More informationMatrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland
Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii
More informationKrylov subspace projection methods
I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks
More informationarxiv: v1 [hep-lat] 2 May 2012
A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations arxiv:1205.0359v1 [hep-lat] 2 May 2012 Fachbereich C, Mathematik und Naturwissenschaften, Bergische Universität
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationOrthogonal iteration to QR
Notes for 2016-03-09 Orthogonal iteration to QR The QR iteration is the workhorse for solving the nonsymmetric eigenvalue problem. Unfortunately, while the iteration itself is simple to write, the derivation
More informationIterative solvers for linear equations
Spectral Graph Theory Lecture 23 Iterative solvers for linear equations Daniel A. Spielman November 26, 2018 23.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline
More informationproblem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e
A Parallel Solver for Extreme Eigenpairs 1 Leonardo Borges and Suely Oliveira 2 Computer Science Department, Texas A&M University, College Station, TX 77843-3112, USA. Abstract. In this paper a parallel
More informationCucheb: A GPU implementation of the filtered Lanczos procedure. Department of Computer Science and Engineering University of Minnesota, Twin Cities
Cucheb: A GPU implementation of the filtered Lanczos procedure Jared L. Aurentz, Vassilis Kalantzis, and Yousef Saad May 2017 EPrint ID: 2017.1 Department of Computer Science and Engineering University
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More information