Data structure of OpenMX and ADPACK

Similar documents
Localized basis methods Theory and implementations. Introduction of OpenMX Implementation of OpenMX

Pseudopotential generation and test by the ld1.x atomic code: an introduction

Why use pseudo potentials?

DFT in practice : Part II. Ersen Mete

Introduction of XPS Absolute binding energies of core states Applications to silicene

The Plane-Wave Pseudopotential Method

Theory of Pseudopotentials. Outline of Talk

Introduction of XPS Absolute binding energies of core states Applications to silicone Outlook

1 Construction of norm-conserving semi-local pseudopotentials for Si

Pseudopotentials and Basis Sets. How to generate and test them

Designed nonlocal pseudopotentials for enhanced transferability

Pseudopotentials for hybrid density functionals and SCAN

DFT / SIESTA algorithms

An Introduction to OPIUM

Pseudopotential methods for DFT calculations

References. Documentation Manuals Tutorials Publications

Pseudopotentials: design, testing, typical errors

Norm-conserving pseudopotentials and basis sets in electronic structure calculations. Javier Junquera. Universidad de Cantabria

Numerical atomic basis orbitals from H to Kr

Basis sets for SIESTA. Emilio Artacho. Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge

The Projector Augmented Wave method

Construction pseudo-potentials for the projector augmentedwave. CSCI699 Assignment 2

Electronic band structure, sx-lda, Hybrid DFT, LDA+U and all that. Keith Refson STFC Rutherford Appleton Laboratory

Efficient projector expansion for the ab initio LCAO method

Toward an O(N) method for the non-equilibrium Green's function calculation

2. TranSIESTA 1. SIESTA. DFT In a Nutshell. Introduction to SIESTA. Boundary Conditions: Open systems. Greens functions and charge density

CP2K: the gaussian plane wave (GPW) method

1. Hydrogen atom in a box

Dept of Mechanical Engineering MIT Nanoengineering group

Open-Source Pseudopotential Interface/Unification Module (OPIUM): The Basic Ins and Outs of Operation

On-the-fly pseudopotential generation in CASTEP

Density Functional Theory. Martin Lüders Daresbury Laboratory

The Linearized Augmented Planewave (LAPW) Method

Physics 492 Lecture 19

Projector augmented wave Implementation

Electron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals

All electron optimized effective potential method for solids

Exercises on basis set generation Full control on the definition of the basis set functions: the PAO.Basis block. Javier Junquera

GWL Manual. September 30, 2013

Atomic orbitals of finite range as basis sets. Javier Junquera

COMPUTATIONAL TOOL. Fig. 4.1 Opening screen of w2web

Plane waves, pseudopotentials and PAW. X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set

Efficient implementation of the nonequilibrium Green function method for electronic transport calculations

The Quantum ESPRESSO Software Distribution

How to generate a pseudopotential with non-linear core corrections

DFT EXERCISES. FELIPE CERVANTES SODI January 2006

Accuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation

Atomic orbitals of finite range as basis sets. Javier Junquera

ALMA: All-scale predictive design of heat management material structures

Speed-up of ATK compared to

Band calculations: Theory and Applications

Practical Guide to Density Functional Theory (DFT)

Oslo node. Highly accurate calculations benchmarking and extrapolations

Poisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code

Introduction to DFTB. Marcus Elstner. July 28, 2006

Multi-reference Density Functional Theory. COLUMBUS Workshop Argonne National Laboratory 15 August 2005

Density Functional Theory: from theory to Applications

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

Schrödinger equation for central potentials

Self-consistent Field

Lecture 9. Hartree Fock Method and Koopman s Theorem

Key concepts in Density Functional Theory (I) Silvana Botti

DFT calculations of NMR indirect spin spin coupling constants

DFT with Hybrid Functionals

Strategies for Solving Kohn- Sham equations

Electronic structure calculations with GPAW. Jussi Enkovaara CSC IT Center for Science, Finland

Fast spherical Bessel transform via fast Fourier transform and recurrence formula

Introduction to Density Functional Theory with Applications to Graphene Branislav K. Nikolić

Solving Many-Body Schrödinger Equation Using Density Functional Theory and Finite Elements

arxiv:physics/ v2 [physics.atom-ph] 31 May 2004

Introduction to First-Principles Method

Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale

Computational Methods. Chem 561

Ab Initio Calculations for Large Dielectric Matrices of Confined Systems Serdar Ö güt Department of Physics, University of Illinois at Chicago, 845 We

Density Functional Theory (DFT) modelling of C60 and

Projector-Augmented Wave Method:

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

Lecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method

Density Functional Theory: from theory to Applications

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

Fast and accurate Coulomb calculation with Gaussian functions

Problem 1: Spin 1 2. particles (10 points)

Table of Contents. Table of Contents Spin-orbit splitting of semiconductor band structures

Solid State Theory: Band Structure Methods

DENSITY FUNCTIONAL THEORY FOR NON-THEORISTS JOHN P. PERDEW DEPARTMENTS OF PHYSICS AND CHEMISTRY TEMPLE UNIVERSITY

All-electron density functional theory on Intel MIC: Elk

Density Functional Theory - II part

Walter Kohn was awarded with the Nobel Prize in Chemistry in 1998 for his development of the density functional theory.

Short-Ranged Central and Tensor Correlations. Nuclear Many-Body Systems. Reaction Theory for Nuclei far from INT Seattle

Massively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science

Pseudopotential. Meaning and role

Many electrons: Density functional theory Part II. Bedřich Velický VI.

Ab initio study of Mn doped BN nanosheets Tudor Luca Mitran

quantum ESPRESSO stands for Quantum open-source Package for Research in Electronic Structure, Simulation, and Optimization

THE JOURNAL OF CHEMICAL PHYSICS 127,

Electronic structure theory: Fundamentals to frontiers. 1. Hartree-Fock theory

Lecture on First-Principles Computation (11): The LAPW Method

PHYSICAL REVIEW B 66,

MODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO

Transcription:

Data structure of OpenMX and ADPACK Structure of ADPACK Solving the 1D Dirac equation Pseudopotentials Localized basis functions Structure of OpenMX Input_std() Total energy Data structure for parallelization Taisuke Ozaki (Univ. of Tokyo) OpenMX developer s meeting, Dec. 21-22, 2015

What is ADPACK? ADPACK (Atomic Density functional program PACKage) is a software to perform density functional calculations for a single atom The features are listed below: All electron calculation by the Schrödinger or Dirac equation LDA and GGA treatment to exchang-correlation energy Finite element method (FEM) for the Schrödinger equation Pseudopotential generation by the TM, BHS, MBK schemes Pseudopotential generation for unbound states by Hamann's scheme Kleinman and Bylander (KB) separable pseudopotential Separable pseudopotential with Blöchl multiple projectors Partial core correction to exchange-correlation energy Logarithmic derivatives of wave functions Detection of ghost states in separable pseudopotentials Scalar relativistic treatment Fully relativistic treatment with spin-orbit coupling Generation of pseudo-atomic orbitals under a confinement potential The pseudopotentials and pseudo-atomic orbitals can be the input data for OpenMX.

ADPACK is freely available ADPACK can be available under GNU-GPL. http://www.openmx-square.org/

Programs of ADPACK Programs: Link: 65 C rounties and 5 header files (50,000 lines) LAPACK and BLAS Main routine: adpack.c Input: readfile.c, Inputtool.c Output: Output.c All electron calculations: All_Electron.c, Initial_Density.c, Core.c Numerical solutions for Schroedinger and Dirac eqs.: Hamming_I.c, Hamming_O.c Density: Density.c, Density_PCC.c, Density_V.c Exchange-Correlation: XC_CA.c, XC_EX.c, XC_PW91.c, XC_VWN.c, XC_PBE.c Mixing: Simple_Mixing.c Pseudopotentials: MBK.c, BHS.c, TM.c Pseudo-atomic orbitals: Multiple_PAO.c The global variables are declared in adpack.h.

1D-Dirac equation with a spherical potential 1-dimensional radial Dirac equation for the majority component G is given by The mass term is given by Minority component By expressing the function G by the following form, One obtain a set of equations: L の満たすべき条件 The charge density is obtained from

Solving the 1D-Dirac equation By changing the varial r to x with, and applying a predictor and corrector method, we can derive the following equations: For a given E, the L and M are solved from the origin and distant region, and they are matched at a matching point. L M In All_Electron.c, the calculation is performed.

Phase shift Scattering by a spherical potential Incident wave Scattered wave iqr iq r e i l sca (, r) e (2l 1) e sin lpl(cos ) qr l d r0 jl ( kr) r D ( ) ( 0 l jl kr0) tan l (, r0 ) dr d r0 nl ( kr) r D ( ) ( 0 l nl kr0) dr l i e qr 1 2 q 2 Logarithmic derivative of ψ d Dl(, r) r ln l( r) dr 2 D r drr r d r r r0 2 2 l(, ) r ( ) 0 2 l 0 0 l ( 0) If the norm of pseudized wave is conserved within r 0 and the logarithmic derivative coincides with that for the all electron case, the phase shift coincides with the all electron case to first order.

Norm-conserving pseudopotential by Troullier and Matins N. Troullier and J. L. Martins, Phys. Rev. B 43, 1993 (1991). For, the following form is used. Putting u l into radial Schroedinger eq. and solving it with respect to V, we have c 0 ~c 12 are determined by the following conditions: Norm-conserving condition within the cutoff radius The second derivatives of V (scr) is zero at r=0 Equivalence of the derivatives up to 4 th orders of u l at the cutoff radius

Ultrasoft pseudopotential by Vanderbilt D. Vanderbilt, PRB 41, 7892 (1990). The phase shift is reproduced around multiple reference energies by the following non-local operator. If the following generalized norm conserving condition is fulfilled, the matrix B is Hermitian. Thus, in the case the operator V NL is also Hermitian. If Q=0, then B-B*=0

How the non-local operator works? Operation of the non-local operator to pseudized wave function Note that It turns out that the following Schroedinger equation is satisfied.

The matrix B and the generalized norm conserving condition The matrix B is given by Thus, we have By integrating by parts (1) By performing the similar calculations, we obtain for the all electron wave functions (2) By subtracting (2) from (1), we have the following relation.

Norm-conserving pseudopotential by MBK I. Morrion, D.M. Bylander, and L. Kleinman, PRB 47, 6728 (1993). If Q ij = 0, the non-local operator can be transformed to a diagonal form. The form is exactly the same as that for the Blochl expansion, resulting in no need for modification of OpenMX. To satisfy Q ij =0, the pseudized wave function is written by The coefficients can be determined by matching up to the third derivatives to those for the all electron, and Q ij =0. Once c s are determined, χ is given by

The form of MBK pseudopotentials The pseudopotential is given by the sum of a local term V loc and non-local term V NL. (ps) V V () r V loc NL The local term V loc is independent of the angular channel l. On the other hand, the non-local term V NL is given by projectors V NL i i i i The projector consists of radial and spherical parts, and depends on species, radial and l-channels.

Optimization of pseudopotentials (i) Choice of parameters 1. Choice of valence electrons (semi-core included?) 2. Adjustment of cutoff radii by monitoring shape of pseudopotentials 3. Adustment of the local potential 4. Generation of PCC charge Optimization of PP typically takes a half week per a week. (ii) Comparison of logarithm derivatives No good If the logarithmic derivatives for PP agree well with those of the all electron potential, go to the step (iii), or return to the step (i). No good good (iii) Validation of quality of PP by performing a series of benchmark calculations. good Good PP

Database (2013) Optimized VPS and PAO are available, which includes relativistic effect with spin-orbit coupling. http://www.openmx-square.org/

Close look at vps files #1 Input file In the header part, the input file for the ADPACK calculations are shown, which maybe helpful for the next generation of pseudopotentials.

Close look at vps files #2 Eigenvalues for all electron calculation The eigenvalues with j=l±1/2 for the all electron calculations by the Dirac equation are included, which can be used to estimate the splitting by spin-orbit coupling

Close look at vps files #3 Information for pseudopotentials The specification for the pseudopotentials is made by vps.type, number.vps, and pseudo.nandl. The project energies λ is shown as follows:

Close look at vps files #4 The generated pseudopotentials are output by Pseudo.Potentials 1st column: x 2 nd column: r=exp(x) in a.u. 3 rd column: radial part of local pseudopotential 4 th and later columns: radial part of non-local pseudopotentials.

Close look at vps files #5 Charge density for partial core correction 1st column: x 2 nd column: r=exp(x) in a.u. 3 rd column: charge density for PCC

Primitive basis functions 1. Solve an atomic Kohn-Sham eq. under a confinement potential: s-orbital of oxygen 2. Construct the norm-conserving pseudopotentials. 3. Solve ground and excited states for the the peudopotential for each L-channel. In most cases, the accuracy and efficiency can be controlled by Cutoff radius Number of orbitals PRB 67, 155108 (2003) PRB 69, 195113 (2004)

Variational optimization of basis functions One-particle wave functions Contracted orbitals The variation of E with respect to c with fixed a gives Regarding c as dependent variables on a and assuming KS eq. is solved self-consistently with respect to c, we have Ozaki, PRB 67, 155108 (2003)

Optimization of basis functions 1. Choose typical chemical environments 2. Optimize variationally the radial functions 3. Rotate a set of optimized orbitals within the subspace, and discard the redundant funtions

Close look at pao files #1 Contraction coefficients The optimized PAO are generally obtained by two or three calculations for orbital optimization. The contraction coefficients are attached in the header of the pao file.

Close look at pao files #2 Input file The input file for the ADPACK calculations is attached which can be helpful for checking the computational condition.

Close look at pao files #3 e.g., Si7.0.pao 1 4 8 The eigenvalues of pseudized and confined states are shown. The information can be used to choose basis functions step by step. 2 5 3 7 One may find that the structural properties can be obtained by Si7.0- s2p2d1. Also, it is found that the cutoff of 7.0 (a.u) reaches the convergent result from the comparison between Si7.0-s3p2d2f1 and Si8.0-s3p2d2f1. 6

Close look at pao files #4 Valence density 1st column: x 2 nd column: r=exp(x) in a.u. 3 rd column: valence density for the chosen states

Close look at pao files #4 The generated radial functions are output by pseudo.atomic.orbitals.l=0,1, 1st column: x 2 nd column: r=exp(x) in a.u. 3 rd and later columns: radial part of basis functions

Δ-factor The delta factor is defined as difference of total energy between Wien2k (FLAPW+LO) and a code under testing, which is shown as shaded region in figure below, where the volume is changed by plus and minus 6 % taken from the equilibrium V 0. Lejaeghere et al., Critical Reviews in Solid State and Materials Sciences 39, 1-24 (2014).

Comparison of codes in terms of Δ-factor http://molmod.ugent.be/deltacodesdft

Structures of OpenMX Language: C, fortran90 265 sub files, about 1000 sub routines 21 header files About 300,000 lines Compilation by makefile Eigenvalue solver: ELPA included Linking of LAPACK, BLAS, FFTW3 Hybrid parallelization by MPI,OpenMP openmx_common.h Basic structure Common functions and common variables are declared in the header file. openmx() Input_std() truncation() DFT() MD_pac() OutData() Main computational flow Reading input file Analysis of geometrical structure, memory allocation, analysis of communication pattern SCF, total energy, forces Molecular dynamics and geometry optimization Output

Input_std.c The input file is analyzed in Input_std() which employs Inputtool.c. After searching keywords, the value after the keyword is set for the keyword. The variables for the keywords are declared in openmx_common.h. Inputtool.c allows us to write the input file in arbitrary order for the keywords.

Implementation: Total energy (1) The total energy is given by the sum of six terms, and a proper integration scheme for each term is applied to accurately evaluate the total energy. Kinetic energy Coulomb energy with external potential Hartree energy Exchange-correlation energy Core-core Coulomb energy TO and H.Kino, PRB 72, 045121 (2005)

Implementation: Total energy (2) The reorganization of Coulomb energies gives three new energy terms. The neutral atom energy Short range and separable to twocenter integrals s Difference charge Hartree energy Long range but minor contribution Screened core-core repulsion energy Short range and two-center integrals Difference charge Neutral atom potential

Implementation: Total energy (3) So, the total energy is given by Each term is evaluated by using a different numerical grid with consideration on accuracy and efficiency. } Spherical coordinate in momentum space The relevant subroutines Set_OLP_Kin.c, Total_Energy.c Set_ProExpn_VNA.c, Total_Energy.c Set_Nonlocal.c, Total_Energy.c } Real space regular mesh Poisson.c, Total_Energy.c Set_XC_Grid.c, Total_Energy.c Real space fine mesh Total_Energy.c

Atomic 3D atomic partitioning How one can partition atoms to minimize communication and memory usage in the parallel calculations? Requirement: Locality Same computational cost Applicable to any systems Small computational overhead T.V.T. Duy and T. Ozaki, Comput. Phys. Commun. 185, 777-789 (2014).

Modified recursive bisection If the number of MPI processes is 19, then the following binary tree structure is constructed. In the conventional recursive bisection, the bisection is made so that a same number can be assigned to each region. However, the modified version bisects with weights as shown above. This is performed in Set_Allocate_Atom2CPU.c

Reordering of atoms by an inertia tensor This is performed in Set_Allocate_Atom2CPU.c

Recursive atomic partitioning The method guarantees Locality of atomic partitioning Balanced computational cost Applicability to any systems Small computational cost T.V.T. Duy and T. Ozaki, CPC 185, 777 (2014).

Allocation of atoms to processess Diamond 16384 atoms, 19 processes Multiply connected CNT, 16 processes

Definition of neighboring atoms Each atom has strictly localized basis functions. Thus, the nonzero overlap between basis functions occurs if r 12 < (r c1 + r c2 ). The analysis is performed in Trn_System() of truncation.c, and relevant variables for the information are r c2 r 12 FNAN[]: # of neighboring atoms natn[][]: global index of the neighboring atom ncn[][]: cell index of the neighboring atom r c1

Three indices for atoms Global index: if there are N atoms in the unit cell, then each atom has a global index which is within 1 to N. intermediate index: a set of atoms (N p atoms) are assigned to each MPI process. Then index within the MPI process each atom has an intermediate which is within 1 to N p. Local index: The second index of natn[][] and ncn[][] runs for a local index which is within 1 to FNAN[].

Example: the three indices of atoms Only the non-zero Hamiltonian matrix elements are stored in H. An example is given below to show how the conversion among the three indices is made. where Mc_AN is the intermediate index, and Gc_AN is the global index, h_an is the local index, respectively. M2G, F_G2M, and S_G2M can be used to convert the indices.

Cutoff energy for regular mesh The two energy components E δee + E xc are calculated on real space regular mesh. The mesh fineness is determined by plane-wave cutoff energies. The cutoff energy can be related to the mesh fineness by the following eqs.

Partitioning of grids Uniform grid is used to calculate matrix elements and solve Poisson s equation. A hundred million grid points for a few dozen thousand atoms. A proper one of four data structures for grid is used for each calculation. They are designed to minimize the MPI communication. These data structure are all constructed in Construct_MPI_Data_Structure_Grid() and Set_Inf_SndRcv() of truncation.c.

Case study #1 Values of basis functions are calculated on grids in the structure A. In Set_Orbitals_Grid.c Nc runs over grids in the structure A.

Case study #2: 2D-parallelization of 3D-FFT in Poisson.c c c a Compared to 1D-parallelization, no increase of MPI communication up to N. Even at N 2, just double communication. b b a

Case study #3 Each MPI process calculates a part of the energy terms evaluated on the regular mesh using the structure B(ABC) as follows: In Total_Energy.c BN runs over grids in the structure B(ABC).

Case study #4 Each MPI process calculates exchange-correlation potential and energy density using the structure D as follows: In Set_XC_Grid.c MN runs over grids in the structure D.

Summary The data structure of OpenMX is carefully designed so that the size of memory and MPI communication can be minimized. The first step to construct the data structure is how atoms are allocated to each MPI process by using the modified bisection and inertia tensor projection methods. The second step to construct the data structure is how four sorts of grid structure are constructed. Each of calculations are performed by choosing one of the four grid structures, and changing the grid structure requires the MPI communication, of which pattern is determined beforehand in truncation().