Algorithms and Computational Aspects of DFT Calculations Part II Juan Meza and Chao Yang High Performance Computing Research Lawrence Berkeley National Laboratory IMA Tutorial Mathematical and Computational Approaches to Quantum Chemistry Institute for Mathematics and its Applications, University of Minnesota September 26-27, 2008 Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 1 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 2 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 3 / 37
Goals 1 The Role of Computation 2 Review Equations and Solution Techniques 3 Discuss Major Computational Aspects of Plane Wave DFT codes 4 Present Some Parallelization Issues 5 Highlight Computational Challenges Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 4 / 37
Materials by design Advances in density functional theory coupled with multinode computational clusters now enable accurate simulation of the behavior of multi-thousand atom complexes that mediate the electronic and ionic transfers of solar energy conversion. These new and emerging nanoscience capabilities bring a fundamental understanding of the atomic and molecular processes of solar energy utilization within reach. Basic Research Needs for Solar Energy Utilization, Report of the BES Workshop on Solar Energy Utilization,April 18-21, 2005 Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 5 / 37
DFT codes are widely used for science applications 9470 nodes; 19,480 cores 13 Tflops/s SSP (100 Tflops/s peak) Upgrade to QuadCore (355 Tflops/s peak) DFT methods account for 75% of the materials sciences simulations at NERSC, totaling over 5 Million hours of computer time in 2006 Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 6 / 37
We can now simulate some realistic structures The charge density of a 15,000 atom quantum dot, Si 13607H 2236. Using 2048 processors at NERSC the calculation took about 5 hours. The calculated dipole moment of a 2633 atom CdSe quantum rod, Cd 961Se 724H 948. Using 2560 processors at NERSC the calculation took about 30 hours. Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 7 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 8 / 37
Kohn-Sham Equations Recall our goal is to find the ground state energy by minimizing the Kohn-Sham total energy, E total Leads to: Kohn-Sham equations Hψ i = ɛ i ψ i, i = 1, 2,..., n [ e H = 1 ] 2 2 + V (ρ(r)), ρ V (ρ(r)) = V ext (r) + r r + V xc(ρ) Nonlinear eigenvalue problem since the Hamiltonian, H, depends on ψ through the charge density, ρ Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 9 / 37
Discretized Kohn-Sham Equations KKT conditions X L(X, Λ) = 0, X X = I ne. Discretized Kohn-Sham equations can now be written as: H(X)X = XΛ, X X = I ne. Kohn-Sham Hamiltonian given by: H(X) = 1 2 L + V (X), V (X) = V ext + Diag (L ρ(x)) + Diag g xc (ρ(x)) Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 10 / 37
The SCF Iteration [ 1 ]ψ Given an initial charge density ρ 1 2 2 + V (ρ(r)) i = E i ψ i compute a potential V k (ρ(r)) 2 Solve the linear eigenvalue problem {ψ i } i=1,...,ne ρ(r) = n e i ψ i (r) 2 V (ρ(r)) for the ψ i, i = 1,..., n e 3 Compute the new charge density ρ 4 Update ρ using your favorite mixing scheme 5 Compute V k+1 and repeat until converged Overall computational complexity is O(N n 2 e) due to linear algebra Major computational components CG method Orthogonalization Computation of potentials 3D FFT Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 11 / 37
What Are the Computational Issues? DFT methods account for 75% of the material science simulations at NERSC Parallel efficiencies can be quite high on plane wave basis can scale to 1000 processors on plane wave basis and wavefunction index can scale to 10, 000 processors Most codes still based on O(N 3 ) algorithms Not systematically improvable Inadequate for strong and/or non-local correlations Parallel efficiencies can be difficult to achieve; 10-20% parallel efficiency is not uncommon Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 12 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 13 / 37
Major Computational Components of Plane Wave DFT Codes Eigenvalue solver Orthogonalization 3D FFTs Computation of potentials Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 14 / 37
Eigenvalue Solver Need to solve one N n e linear eigenvalue problem at each SCF iteration The size of N can easily be 10,000 100,000 Only need the n e ( number of atoms) lowest eigenvalues and corresponding eigenvectors Called diagonalization in chemistry/materials science circles Various approaches including CG, Grassmann CG, residual minimization Distinction is usually made between all band vs. band-by-band, which corresponds to solving for all eigenvectors simultaneously vs. solving for one eigenvector at a time. We would call this blocked vs. unblocked Use of optimized high-level BLAS3 routines can significantly improve performance Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 15 / 37
Orthogonalization Due to physical constraints, the electronic wavefunctions must be orthonormal This adds a constraint to the KS equations in the form of X X = I ne Can be time consuming for large systems Complexity is O(N n 2 e), where N is the size of the discretization and n e is the number of electrons Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 16 / 37
FFTs Recall that the kinetic energy operator takes on a particularly simple form in Fourier space (also called G-space) Most DFT codes take advantage of this fact by converting from real space to G-space for computation of the Hamiltonian Since systems are usually 3D, codes need to compute the 3D FFTs through a series of 1D FFTs This has a consequence both in the total amount of work and when trying to parallelize the codes Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 17 / 37
Computation of potentials The Hartree potential, V Hartree = ρ r r, can be computed in several ways The calculation can be posed as the solution of a Poisson problem. Fast Poisson solvers or multigrid can also be used Because the potential can be viewed a convolution, it can also be computed using FFTs Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 18 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 19 / 37
Parallel Calculations Milestones 1991 Silicon surface reconstruction (7x7), Meiko I860, 64 processor, (Stich, Payne, King-Smith, Lin, Clarke) 1998 FeMn alloys (exchange bias), Cray T3E, 1500 procs; First > 1 Tflop simulation, Gordon Bell prize (Ujfalussy, Stocks, Canning, Y. Wang, Shelton et al.) 2005 1000 atom Molybdenum simulation with Qbox, BlueGene/L at LLNL with 32,000 processors (F. Gygi et al.) 2008 Band-gap calculation of a 13,824 atom ZnTeO alloy proposed as a new solar cell material. Used 131,072 processors on Blue Gene/P at ANL achieved 107.5 Tflops/s Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 20 / 37
Parallelization Strategies Parallel across k-points Not useful for large systems as k is usually small Parallel over electrons number of processors limited by number of electrons Parallel over the number of plane-wave basis, n g most commonly used in plane-wave codes Parallelization of DFT codes is nontrivial and most codes cannot scale to large numbers of processors with even moderate efficiencies. 30% parallel efficiency is usually considered very good Parallelization issues for Hartree-Fock codes are similar, especially for SCF Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 21 / 37
Parallelization of 3D FFT 3D FFTs are computed via 3 sets of 1D FFTs and 2 transposes Most of the communication is in global transpose (b) to (c) Ratio of flops/comm log N Many FFTs are computed at the same time to avoid latency issues Only non-zero elements computed/communicated For details see (Canning et al.): http://www.nersc.gov/projects/paratec/ Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 22 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 23 / 37
Linear Scaling Electronic Structure Methods Goal is to reduce the computational work from O(N 3 ) to O(N) Quantum mechanical effects are near-sighted, e.g. treat the computation of the exchange-correlation potential locally Need to introduce concept of a localization region, inside which the quantity of interest is computed and is assumed to vanish outside the region Six strategies for taking advantage of this fact (see Goedecker (1999)): 1 Fermi operator expansion 2 Fermi operator projection 3 Divide-and-conquer 4 Density-matrix minimization 5 Orbital minimization approach 6 Optimal basis density-matrix minimization Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 24 / 37
LS3DF Based on Divide-and-Conquer approach Divide a large system into smaller sub-domains that can be solved independently, then stitch the sub-domains back together again Classical electrostatic interactions are long-ranged, i.e. solve one global Poisson equation Requires minimal communication between the sub-domains Artificial boundary effects due to sub-dividing domains can be cancelled out Based on ideas from fragment molecular method We call our method Linear Scaling 3D Fragment or LS3DF 1 1 L.W. Wang, Z. Zhao, J. Meza, LBNL-61691 (2006) Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 25 / 37
Parallelism Issues IBM Cell Blade. Same processor as found in a Sony Playstation 3 Multi-core and many-core is the wave of the future Current algorithms for parallelism are difficult to parallelize with high efficiency Many quantum chemistry codes do not parallelize well for even medium scaled paralellism Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 26 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 27 / 37
Electronic Structure Codes ABINIT www.abinit.org PARATEC www.nersc.gov/projects/paratec PEtot hpcrd.lbl.gov/linwang/petot/petot.html PWscf www.pwscf.org NWChem www.emsl.pnl.gov/docs/nwchem/nwchem.html Q-Chem www.q-chem.com/ Quantum Espresso www.quantum-espresso.org Socorro dft.sandia.gov/socorro VASP cms.mpi.univie.ac.at/vasp Many, many more apologies if your favorite code was not listed Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 28 / 37
KSSOLV Matlab package KSSOLV Matlab code for solving the Kohn-Sham equations Open source package Handles SCF, DCM, Trust Region Example problems to get started with Object-oriented design - easy to extend Good starting point for students Beta version of KSSOLV available, ask one of us for more information! Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 29 / 37
Example: SiH 4 a1 = Atom( Si ); a2 = Atom( H ); alist = [a1 a2 a2 a2 a2]; xyzlist= [ 0.0 0.0 0.0 1.61 1.61 1.61... ]; mol = Molecule(); mol = set(mol, supercell,c); mol = set(mol, atomlist,alist); mol = set(mol, xyzlist,xyzlist); mol = set(mol, ecut, 25); mol = set(mol, name, SiH4 );... isosurface(rho); Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 30 / 37
Convergence [Etot, X, vtot, rho] = scf(mol); [Etot, X, vtot, rho] = dcm(mol); Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 31 / 37
Charge Density isosurface(rho); Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 32 / 37
Example: P t 6 Ni 2 O cell: 19.59 0.0 0.0... sampling size: n1 = 96, n2 = 48, n3 = 48 atoms and coordinates: 1 Pt 1.3-0.180-0.015... 7 Ni 8.4 0.003 3.069 8 Ni 8.5 7.998 7.762 9 O 14.9 2.644 1.511 number of electrons : 86 spin type : 1 kinetic energy cutoff: 60.0 Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 33 / 37
Comparison of DCM vs. SCF Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 34 / 37
1 Goals and Motivation 2 Review of Equations 3 Plane Wave DFT Computational Components 4 Parallelization Strategies 5 Future Computational Challenges Linear Scaling Methods Parallelism Issues 6 Software Available Codes KSSOLV 7 Summary Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 35 / 37
Summary Described most common PW DFT computational components Overview of standard numerical methods used Brief introduction into some parallelization issues Listed some computational challenges Introduced KSSOLV, Matlab package for solving KS equations Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 36 / 37
References Aron J. Cohen, Paula Mori-Snchez, Weitao Yang, Insights into Current Limitations of Density Functional Theory, Science, Vol. 321. no. 5890, pp. 792-794 (2008). F. Gygi, R. K. Yates, J. Lorenz, E. W. Draeger, F. Franchetti, C. W. Ueberhuber, B. R. de Supinski, S. Kral, J. A. Gunnels, J. C. Sexton, Proceedings of the 2005 ACM/IEEE conference on Supercomputing (2005). G. Goedecker, Linear Scaling Electronic Structure Methods, Rev. Mod. Phys. 71, 1085 (1999). Curtis L. Janssen and Ida M.B. Nielsen, Parallel Computing in Quantum Chemistry, CRC Press, (2008). Juan Meza (LBNL) Algorithms and Computational Aspects of DFT Calculations September 27, 2008 37 / 37