Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016

Size: px
Start display at page:

Download "Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016"

Transcription

1 Domain specific libraries Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016

2 Part 1: Introduction

3 Kohn-Shame equations 1 2 Eigen-value problem + v eff (r) j(r) =" j j (r) Effective potential construction Z (r 0 ) v eff (r) = r 0 r dr0 + v XC [ ](r) + v ext (r) Density generation new (r) = X j j (r) 2 Density mixing (r) = new (r)+(1 ) old (r) Output: wave-functions j(r) and eigen energies " j charge density (r) and magnetization m(r) total energy E tot and atomic forces F Derived objects: Spectral functions Linear response functions Phonons, thermodynamic functions Parameters of lattice models

4 Basis expansion Expand Kohn-Sham wave functions in a convenient basis set: j(r) = X µ C µj µ (r) Convert differential equation to an eigen-value problem: HC j = " j OC j H µµ 0 = Z µ(r) O µµ 0 = Z where v eff (r) µ(r) µ 0(r)dr µ 0 (r)dr

5 Electronic structure codes Code Quantum ESPRESSO VASP ABINIT PEtot CPMD QBOX CASTEP SIESTA CRYSTAL FHI-AIMS FPLO OpenMX TB-LMTO CP2K BigDFT WIEN2k FLEUR Exciting Elk GPAW Octopus Basis set plane-waves localized atom-centered orbitals gaussians + plane-waves wavelets linarized augmented plane-waves real space grid / plane waves / atomic orbitals

6 Plane-wave basis Plane-wave with the wave-vector G e igr = e i(xg x+yg y +zg z ) = cos(gr)+i sin(gr) = 2 G Complete orthogonal basis Eigen-functions of the Laplace z e igr = G 2 e igr Fourier transformations can be rapidly evaluated with FFT Real-space integrals between periodic functions are represented as sums Z f (r)g(r)dr = N X f (r j )g(r j )= X j G f (G)g(G)

7 Pseudopotential Plane-wave basis requires a pseudopotential Unlike the true (all-electron) potential, pseudopotential has an additional nonlocal term: ˆV PS = V loc (r)+ X X 0 id 0 h 0 Action of the pseudopotential on the wave-function: ˆV PS (r) =V loc (r) (r)+ X X 0 Z (r)d 0 0 (r) (r)dr

8 Eigen-value problem H j = " j j H j = " j S j in case of norm-conserving pseudopotential in case of ultrasoft pseudopotential H and S are dense Hermitian matrices of large dimension (~1000 PW / atom).

9 Eigen-value problem H j = " j j H j = " j S j in case of norm-conserving pseudopotential in case of ultrasoft pseudopotential H and S are dense Hermitian matrices of large dimension (~1000 PW / atom). Iterative subspace diagonalziation methods are used. Conjugate gradient Davidson algorithm RMM-DIIS LOBPCG Chebyshev filering method

10 Hamiltonian operator application Any iterative solver requires operation Ĥ i

11 Hamiltonian operator application Any iterative solver requires operation Ĥ i Result of Hamiltonian application h j = H j = Z e igr Ĥ j (r)dr

12 Hamiltonian operator application Any iterative solver requires operation Ĥ i Result of Hamiltonian application h j = H j = Z e igr Ĥ j (r)dr Hamiltonian operator with pseudopotential Ĥ = v eff (r)+ X X 0 id 0 h 0

13 Hamiltonian operator application Application of the local part of potential (Vloc kernel) j(g) FFT 1! j (r)! v eff (r) j (r) FFT! h j (G)

14 Hamiltonian operator application Application of the local part of potential (Vloc kernel) j(g) FFT 1! j (r)! v eff (r) j (r) FFT! h j (G) do j = 1, num_psi! transform psi to real-space domain call inverse_fft(psi(:,j), psi_of_r(:))! multiply by effective potential do ir = 1, num_r_points psi_of_r(ir) = psi_of_r(ir) * veff(ir) end do! tranform back to PW domain call forward_fft(psi_of_r(:), hpsi(:,j)) end do Two FFTs for each wave-function!

15 Hamiltonian operator application 1 2 j(r) = X G Application of Laplace operator 1 2 eigr j(g) = X G e igr G2 2 j(g)

16 Hamiltonian operator application 1 2 j(r) = X G Application of Laplace operator 1 2 eigr j(g) = X G e igr G2 2 j(g) do j = 1, num_psi! add kinetic energy contribution do ig = 1, num_gvec hpsi(ig, j) = hpsi(ig, j) + pw_ekin(ig) * psi(ig, j) end do end do

17 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G)

18 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G)! compute <beta psi> call zgemm('c', 'N', num_beta_tot, num_psi, num_pw, beta_pw, psi, beta_psi)! apply block-diagonal D matrix do ia = 1, num_atoms call zgemm('n', 'N', num_beta(ia), num_psi, num_beta(ia), D(:, :, ia), beta_psi(offset(ia), 1), d_beta_psi(offset(ia), 1)) end do! compute beta*d*<beta psi> call zgemm('n', 'N', num_pw, num_psi, num_beta_tot, beta_pw, d_beta_psi, hpsi)

19 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G) plane-wave index G h j (G) j += index of atom (G) and orbital D 0 D 1 D 2 D 3 0 (G 0 ) j(g) j

20 Davidson iterative solver We want to compute H j = " j j We know how to compute h j = H j

21 Davidson iterative solver We want to compute H j = " j j We know how to compute h j = H j Key idea of Davidson iterative solver: start with a subspace spanned by a guess to and expand it with preconditioned residuals j

22 Davidson iterative solver 0 Initialize the trial basis set: j ( j

23 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j

24 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G m j (G) h m (G) j 0

25 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: h m Z m = j Z m m j (G) h m (G) j 0

26 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: h m Z m = j Z m m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j

27 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j Apply preconditioner to the unconverged residuals, orthogonalize and add the resulting functions to the basis: h m Z m = j Z m { m+1 j } = { m j } M {P R m j }

28 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j Apply preconditioner to the unconverged residuals, orthogonalize and add the resulting functions to the basis: h m Z m = j Z m { m+1 j } = { m j } M {P R m j } Recompute the wave-functions: j = m j Z m

29 Davidson iterative solver Initialize subspace basis functions and apply Hamiltonian 0 j h 0j = H 0 j plane-wave index G

30 Davidson iterative solver Compute reduced Hamiltonian matrix h 0 jj 0 0 j h 0j = Compute eigen-value probelem = " j

31 Davidson iterative solver Compute residuals R 0 j h 0 j Z 0 Z 0 j jj 0 0 j = - Apply preconditioner: P R 0 j (G) =(H GG " j ) 1 R0 j (G)

32 Davidson iterative solver Expand variational space and apply Hamiltonian to new basis functions 1 j = 0 j M P R0 j h 1 j = H 1 j plane-wave index G

33 Davidson iterative solver Compute reduced Hamiltonian matrix h 1 jj 0 1 j h1 j = Compute eigen-value probelem = " j

34 Davidson iterative solver Compute residuals R 1 j h 1 j Z 1 Z 1 j jj 0 1 j = - Apply preconditioner: P R 1 j (G) =(H GG " j ) 1 R1 j (G)

35 Davidson iterative solver Continue to expand variational space 2 j = 1 j M P R1 j h 2j = H 2 j plane-wave index G

36 Davidson iterative solver Continue to expand variational space 2 j = 1 j M P R1 j h 2j = H 2 j plane-wave index G Iterate until the convergence (all residuals are zero) is reached

37 Davidson iterative solver Recompute the wave-functions m j j = Z m Take " j from the last subspace diagonalization

38 Part 2: Implementation details

39 Common operations FFTs (Vloc kernel, density summation) Subspace Hamiltonian construction = Wave-functions and residuals update = Wave-function orthogonalization

40 Common operations FFTs (Vloc kernel, density summation) Subspace Hamiltonian construction = Wave-functions and residuals update = Wave-function orthogonalization Need scalable parallel implementation!

41 Wave-function storage and distribution Wave-functions i(g) represents a single wave-function. are stored as a matrix: each column of the matrix band index i Number of bands: ~100 10K plane-wave index G Number of plane-waves: ~100K 10M

42 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split bands Split G-vectors Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G

43 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split bands Split G-vectors Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G

44 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split G-vectors Split bands band index i Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G

45 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G)

46 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G) Wave-functions in the plane-wave domain are defined inside a sphere with a given energy cutoff: G 2 max i(g) 6= 0 for G apple G max ; = E cut 2

47 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G) Wave-functions in the plane-wave domain are defined inside a sphere with a given energy cutoff: G 2 max i(g) 6= 0 for G apple G max ; = E cut 2 FFT box for the Vloc kernel typically circumscribes a sphere of radius 2Gmax. z x y

48 FFT decomposition to 1D and 2D The full 3D transformation is decomposed into a 1D transformation along the z-direction and a 2D transformation in the xy-plane: (G x,g y,g z ) (G x,g y,z) (x, y, z) FFT -1 (z) FFT(z) FFT -1 (xy) FFT(xy)

49 FFT parallelization

50 FFT parallelization In real space z-dimension of the FFT buffer is split among MPI ranks MPI rank#0 z MPI rank#1 MPI rank#2 y x

51 FFT parallelization In real space z-dimension of the FFT buffer is split among MPI ranks In reciprocal space z-sticks of the G-vector sphere are distributed among MPI ranks such that: local number of z-sticks is roughly equivalent sum of lengths of z-sticks (local number of G- vectors) is roughly equivalent MPI rank#0 z MPI rank#1 MPI rank#2 y x

52 Parallel transformation of z-sticks FFT -1 (z) FFT(z)

53 Parallel transformation of z-sticks FFT -1 (z) FFT(z) each rank executes 1D transformations of the local fraction of z-sticks FFT -1 (z) p0 p1 p2 FFT(z) p0 p1 p2 {full-z, partial-xy} distribution of a cylinder

54 Parallel transformation of z-sticks FFT -1 (z) FFT(z) each rank executes 1D transformations of the local fraction of z-sticks z-sticks are swapped between MPI ranks using MPI_Alltoall p0 p0 p1 p2 FFT -1 (z) FFT(z) MPI_Alltoall p1 p2 p0 p1 p2 {full-z, partial-xy} distribution of a cylinder {partial-z, full-xy} distribution of a cylinder

55 Data redistribution Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G

56 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G

57 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Remap wave-functions to a 2D MPI grid where one dimension is dedicated to a parallel FFT and second dimension is used to distribute wave-function band index Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G 2D MPI grid of Nr x Nc ranks p0 p1 p2 p3 p4 p5 p6 p7

58 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Remap wave-functions to a 2D MPI grid where one dimension is dedicated to a parallel FFT and second dimension is used to distribute wave-function band index Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G MPI A2A between Nc ranks MPI A2A between Nc ranks 2D MPI grid of Nr x Nc ranks p0 p1 p2 p3 p4 p5 p6 p7 Groups of Nc ranks contain the fat slab of plane-wave coefficients for all bands

59 FFT Vloc kernel Change wave-function distribution from slab to 2D grid (MPI_A2A between Nc ranks) Execute 1D backward FFTs along z for the local set of G-vector sticks swap z-columns (MPI_A2A between Nr ranks) execute 2D backward FFTs in the xy-plane multiply by effective potential execute 2D forward FFTs in the xy-plane swap z-columns (MPI_A2A between Nr ranks) Execute 1D forward FFTs along z for the local set of z-sticks Change wave-function distribution from 2D grid to slab (MPI_A2A between Nc ranks)

60 FFT kernel benchmark on Piz Daint Number of bands is 3000, number of times Vloc is applied is 4, number of G-vectors is ~9.2M, FFT grid size is Total Vloc time Total Vloc efficiency FFT time FFT efficiency 1 Time (sec.) Number of nodes Efficiency

61 Inner product O ij = X G i (G) j(g)

62 Inner product O ij = X G i (G) j(g) Slab data distribution of band index i i(g) plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3

63 Inner product Slab data distribution of band index i O ij = X G i(g) i (G) j(g) 2D block-cyclic distribution of Oij band index j plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3 band index i

64 Inner product: simple implementation

65 Inner product: simple implementation 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) G j i MPI rank#0 MPI rank#1

66 Inner product: simple implementation 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij j i + + = MPI rank#0 MPI rank#1

67 Inner product: simple implementation j 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij 3. Each rank picks the local fraction of matrix elements from Oij i + + = MPI rank#0 MPI rank#1

68 Inner product: simple implementation j 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij 3. Each rank picks the local fraction of matrix elements from Oij i + + = MPI rank#0 MPI rank#1 Requires MPI_Allreduce of global matrix! Each MPI rank receives a lot of unnecessary data!

69 Inner product: reduce-to-one implementation

70 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G jc ir MPI rank#0 MPI rank#1 MPI rank#2

71 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G 2. MPI_Reduce of local panel is performed to (pr, pc) Cartesian rank N X p O ir j c = p=1 O p i r j c jc ir MPI rank#0 MPI rank#1 MPI rank#2

72 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G 2. MPI_Reduce of local panel is performed to (pr, pc) Cartesian rank N X p O ir j c = p=1 O p i r j c 3. Steps 1-2 are repeated for all Cartesians rank of a 2D BLACS grid jc ir MPI rank#0 MPI rank#1 MPI rank#2

73 Inner product: reduce-to-one implementation synchronous zgemm MPI_Reduce asynchronous MPI_Wait zgemm MPI_Ireduce

74 Inner product: reduce-to-one implementation synchronous zgemm MPI_Reduce asynchronous MPI_Wait zgemm MPI_Ireduce A lot of MPI_Reduce! The resulting sub-matrices are small!

75 Inner product: block-allreduce implementation 1. Each rank computes a contribution to the block of global matrix O p i b j b = N p GX G i b (G) j b (G) 2. MPI_Allreduce of block is performed O ib j b = N p X p=1 O p i b j b 3. Each rank picks the local fraction of matrix elements from block O ib j b b=0 b=1 b=2 b=3 b=4 b=5 b=6 b=7 b=8

76 Inner product: block-allreduce implementation 1. Each rank computes a contribution to the block of global matrix O p i b j b = N p GX G i b (G) j b (G) 2. MPI_Allreduce of block is performed O ib j b = N p X p=1 O p i b j b 3. Each rank picks the local fraction of matrix elements from block O ib j b b=0 b=1 b=2 b=3 b=4 b=5 4. Steps 1-3 are repeated for all blocks b=6 b=7 b=8

77 Inner product: block-allreduce implementation synchronous zgemm MPI_Allreduce update local panels asynchronous MPI_Wait update local panels zgemm MPI_Ireduce overlap computation and communication single OMP thread is executing MPI_Allreduce and update of local panels remaining OMP threads execute zgemm

78 Example: communication and computation overlap omp_set_nested(1); /* allow nested OMP */ int nt = omp_get_max_threads(); /* get total number of threds */ #pragma omp parallel num_threads(2) shared(buf_state) /* spawn two threds */ { if (omp_get_thread_num() == 1) { /* thread #1 executes zgemms */ int s{0}; omp_set_num_threads(nt - 1); /* set number of nested threads to nt - 1 */ for (...) { /* loop over blocks */ int state = 1; while (state == 1) { /* wait for the release of the buffer */ #pragma omp atomic read state = buf_state[s % 2]; /* read from shared varray */ } zgemm(...,buf[s % 2]); /* execute local zgemm */ #pragma omp atomic write buf_state[s % 2] = 1; /* lock the buffer */ s++; } } else { /* thread #0 is doing communication */ int s{0}; for (...) { /* loop over blocks */ int state = 0; while (state == 0) { /* wait for the lock of the buffer */ #pragma omp atomic read state = buf_state[s % 2]; /* read from shared array */ } MPI_Allreduce(...,buf[s % 2]); /* execute global reduction */ /* do something with the result, for example, store it */ #pragma omp atomic write buf_state[s % 2] = 0; /* release the buffer */ s++; } } } omp_set_nested(0); /* forbid nested */

79 Inner product kernel benchmark Piz Daint, Intel SB (8 CPU cores) M=1 000 N=1 000 K=

80 Inner product kernel benchmark Piz Daint, Intel SB (8 CPU cores) M= N= K=

81 Wave-function transformation i(g) = X j j(g)z ji

82 Wave-function transformation i(g) = X j j(g)z ji Slab data distribution of and i(g) j(g) wave-function index plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3

83 Wave-function transformation plane-wave index G Slab data distribution of and i(g) j(g) wave-function index MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3 i(g) = X j j(g)z ji 2D block-cyclic distribution of Zji band index i basis function index j

84 Wave-function transformation i0 i1 For each block of the transformation matrix: each MPI rank packs its contribution to the block elements of block are collected with MPI_Allgatherv elements are unpacked and the whole block is formed the updated of the wave-functions i 0 :i 1 (G) from the wave-functions j 0 :j 1 (G) is done by the local zgemm j0 j1

85 Wave-function transformation i0 i1 j0 j1 i0 i1 = j0 i(g) j(g) j1

86 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0

87 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0 Orthogonalize to the old basis functions ii = 1 X j 0 jih 0 j new i i = new i i X j 0 jih 0 j new i i inner product wave-function transformation

88 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} Orthogonalize to the old basis functions ii = 1 X Orthonormalize new functions Oii0 j 0 jih 0 j 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0 new i = h i i0i i = new i i X j inner product 0 jih 0 j new i i inner product wave-function transformation O = LL H Cholesky decomposition new i i = X 1 i0ili 0 i i 0 wave-function transformation

89 Wave-function orthogonalization Test case: orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [1,N] orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [N +1, 2N] w.r.t to the first N wave-functions Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K=

90 Wave-function orthogonalization Test case: orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [1,N] orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [N +1, 2N] w.r.t to the first N wave-functions Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K= Average node performance Efficiency Performance (GFlop/s) Efficiency Number of nodes

91 Wave-function orthogonalization Test case: orthogonalization and transformation of orthogonalization and transformation of w.r.t to the first N wave-functions i i, H i i, i i, H i i, S i i for i 2 [N + 1, 2N ] S ii for i 2 [1, N ] 0 0 j S j 0 i Transformations: 0 1 j il H 0 1 j il S h 0 1 j S j 0 i Transformations: 1 ii H 0 1 j il X j 1 ii S j ih j S i i X j 1 ii H j ih j S i i X j S j ih j S i i h Cholesky factorization and inversion (CPU) h Cholesky factorization and inversion (CPU) Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K= Transformations: 1 1 j S j 0 i 1 1 j il H 1 1 j il S 1 1 j il

92 Conclusion Domain specific libraries allow a separation of concerns between HPC and scientific communities Several compute-intensive kernels that are one level above the standard BLAS/LAPCK/ScaLAPACK/FFTW were identified MaX centre of excellence is working on the initial DSL implementation and API (WP4)

93 Thank you for your attention.

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r

More information

The electronic structure of materials 2 - DFT

The electronic structure of materials 2 - DFT Quantum mechanics 2 - Lecture 9 December 19, 2012 1 Density functional theory (DFT) 2 Literature Contents 1 Density functional theory (DFT) 2 Literature Historical background The beginnings: L. de Broglie

More information

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc

More information

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations The recently developed discontinuous Galerkin density functional theory (DGDFT)[21] aims at reducing the number

More information

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham

More information

Self Consistent Cycle

Self Consistent Cycle Self Consistent Cycle Step 0 : defining your system namelist SYSTEM How to specify the System All periodic systems can be specified by a Bravais Lattice and and atomic basis How to specify the Bravais

More information

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3

Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3 A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun

More information

Introduction to Parallelism in CASTEP

Introduction to Parallelism in CASTEP to ism in CASTEP Stewart Clark Band University of Durham 21 September 2012 Solve for all the bands/electrons (Band-) Band CASTEP solves the Kohn-Sham equations for electrons in a periodic array of nuclei:

More information

Density Functional Theory

Density Functional Theory Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using

More information

Dept of Mechanical Engineering MIT Nanoengineering group

Dept of Mechanical Engineering MIT Nanoengineering group 1 Dept of Mechanical Engineering MIT Nanoengineering group » Recap of HK theorems and KS equations» The physical meaning of the XC energy» Solution of a one-particle Schroedinger equation» Pseudo Potentials»

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

6: Plane waves, unit cells, k- points and all that

6: Plane waves, unit cells, k- points and all that The Nuts and Bolts of First-Principles Simulation 6: Plane waves, unit cells, k- points and all that Durham, 6th- 13th December 2001 CASTEP Developers Group with support from the ESF ψ k Network Overview

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

CP2K: the gaussian plane wave (GPW) method

CP2K: the gaussian plane wave (GPW) method CP2K: the gaussian plane wave (GPW) method Basis sets and Kohn-Sham energy calculation R. Vuilleumier Département de chimie Ecole normale supérieure Paris Tutorial CPMD-CP2K CPMD and CP2K CPMD CP2K http://www.cpmd.org

More information

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005 1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

The Plane-Wave Pseudopotential Method

The Plane-Wave Pseudopotential Method Hands-on Workshop on Density Functional Theory and Beyond: Computational Materials Science for Real Materials Trieste, August 6-15, 2013 The Plane-Wave Pseudopotential Method Ralph Gebauer ICTP, Trieste

More information

Plane waves, pseudopotentials and PAW. X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium

Plane waves, pseudopotentials and PAW. X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium Plane waves, pseudopotentials and PAW X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium 1 Basic equations in DFT Solve self-consistently the Kohn-Sham equation H ψ n = ε n ψ n!!! ρ(r

More information

Poisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code

Poisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code CECAM Tutorial on Wavelets in DFT, CECAM - LYON,, in the BigDFT code Kernel Luigi Genovese L_Sim - CEA Grenoble 28 November 2007 Outline, Kernel 1 The with Interpolating Scaling Functions in DFT for Interpolating

More information

Key concepts in Density Functional Theory (II)

Key concepts in Density Functional Theory (II) Kohn-Sham scheme and band structures European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Present Address: LPMCN Université

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

The Plane-wave Pseudopotential Method

The Plane-wave Pseudopotential Method The Plane-wave Pseudopotential Method k(r) = X G c k,g e i(g+k) r Chris J Pickard Electrons in a Solid Nearly Free Electrons Nearly Free Electrons Nearly Free Electrons Electronic Structures Methods Empirical

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Electronic structure, plane waves and pseudopotentials

Electronic structure, plane waves and pseudopotentials Electronic structure, plane waves and pseudopotentials P.J. Hasnip Spectroscopy Workshop 2009 We want to be able to predict what electrons and nuclei will do from first principles, without needing to know

More information

ESLW_Drivers July 2017

ESLW_Drivers July 2017 ESLW_Drivers 10-21 July 2017 Volker Blum - ELSI Viktor Yu - ELSI William Huhn - ELSI David Lopez - Siesta Yann Pouillon - Abinit Micael Oliveira Octopus & Abinit Fabiano Corsetti Siesta & Onetep Paolo

More information

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set

Chapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set Chapter 3 The (L)APW+lo Method 3.1 Choosing A Basis Set The Kohn-Sham equations (Eq. (2.17)) provide a formulation of how to practically find a solution to the Hohenberg-Kohn functional (Eq. (2.15)). Nevertheless

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Why use pseudo potentials?

Why use pseudo potentials? Pseudo potentials Why use pseudo potentials? Reduction of basis set size effective speedup of calculation Reduction of number of electrons reduces the number of degrees of freedom For example in Pt: 10

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Institut Néel Institut Laue Langevin. Introduction to electronic structure calculations

Institut Néel Institut Laue Langevin. Introduction to electronic structure calculations Institut Néel Institut Laue Langevin Introduction to electronic structure calculations 1 Institut Néel - 25 rue des Martyrs - Grenoble - France 2 Institut Laue Langevin - 71 avenue des Martyrs - Grenoble

More information

CHEM6085: Density Functional Theory

CHEM6085: Density Functional Theory Lecture 11 CHEM6085: Density Functional Theory DFT for periodic crystalline solids C.-K. Skylaris 1 Electron in a one-dimensional periodic box (in atomic units) Schrödinger equation Energy eigenvalues

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

Pseudopotential methods for DFT calculations

Pseudopotential methods for DFT calculations Pseudopotential methods for DFT calculations Lorenzo Paulatto Scuola Internazionale Superiore di Studi Avanzati and CNR-INFM DEMOCRITOS National Simulation Center Tieste Italy July 9, 2008 Outline pseudopotential

More information

Matrix Factorization and Analysis

Matrix Factorization and Analysis Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

Massive Parallelization of First Principles Molecular Dynamics Code

Massive Parallelization of First Principles Molecular Dynamics Code Massive Parallelization of First Principles Molecular Dynamics Code V Hidemi Komatsu V Takahiro Yamasaki V Shin-ichi Ichikawa (Manuscript received April 16, 2008) PHASE is a first principles molecular

More information

Introduction to First-Principles Method

Introduction to First-Principles Method Joint ICTP/CAS/IAEA School & Workshop on Plasma-Materials Interaction in Fusion Devices, July 18-22, 2016, Hefei Introduction to First-Principles Method by Guang-Hong LU ( 吕广宏 ) Beihang University Computer

More information

Accuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation

Accuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation Accuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation Stefan Goedecker Stefan.Goedecker@unibas.ch http://comphys.unibas.ch/ Multi-wavelets: High accuracy

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

Practical Guide to Density Functional Theory (DFT)

Practical Guide to Density Functional Theory (DFT) Practical Guide to Density Functional Theory (DFT) Brad Malone, Sadas Shankar Quick recap of where we left off last time BD Malone, S Shankar Therefore there is a direct one-to-one correspondence between

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

Petascale Quantum Simulations of Nano Systems and Biomolecules

Petascale Quantum Simulations of Nano Systems and Biomolecules Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration

More information

Electronic Structure Calculations, Density Functional Theory and its Modern Implementations

Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Tutoriel Big RENOBLE Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Thierry Deutsch L_Sim - CEA renoble 19 October 2011 Outline 1 of Atomistic calculations

More information

7.2 Steepest Descent and Preconditioning

7.2 Steepest Descent and Preconditioning 7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Large Scale Electronic Structure Calculations

Large Scale Electronic Structure Calculations Large Scale Electronic Structure Calculations Jürg Hutter University of Zurich 8. September, 2008 / Speedup08 CP2K Program System GNU General Public License Community Developers Platform on "Berlios" (cp2k.berlios.de)

More information

Fine-grained Parallel Incomplete LU Factorization

Fine-grained Parallel Incomplete LU Factorization Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Edgar Solomonik 1, Devin Matthews 3, Jeff Hammond 4, James Demmel 1,2 1 Department of

More information

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures

Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures Stanimire Tomov 1, Julien Langou 1, Andrew Canning 2, Lin-Wang Wang 2, and Jack Dongarra 1 1 Innovative

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Reminder about

More information

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Konrad Waldherr October 20, 2013 Joint work with Thomas Huckle QCCC 2013, Prien, October 20, 2013 1 Outline Numerical (Multi-) Linear

More information

1. Hydrogen atom in a box

1. Hydrogen atom in a box 1. Hydrogen atom in a box Recall H atom problem, V(r) = -1/r e r exact answer solved by expanding in Gaussian basis set, had to solve secular matrix involving matrix elements of basis functions place atom

More information

Electron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals

Electron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals Electron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals Richard M. Martin UIUC Lecture at Summer School Hands-on introduction to Electronic Structure Materials Computation Center University

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Wavelets for density functional calculations: Four families and three. applications

Wavelets for density functional calculations: Four families and three. applications Wavelets for density functional calculations: Four families and three Haar wavelets Daubechies wavelets: BigDFT code applications Stefan Goedecker Stefan.Goedecker@unibas.ch http://comphys.unibas.ch/ Interpolating

More information

Pseudopotentials for hybrid density functionals and SCAN

Pseudopotentials for hybrid density functionals and SCAN Pseudopotentials for hybrid density functionals and SCAN Jing Yang, Liang Z. Tan, Julian Gebhardt, and Andrew M. Rappe Department of Chemistry University of Pennsylvania Why do we need pseudopotentials?

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

MODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO

MODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO MODULE 2: QUANTUM MECHANICS Practice: Quantum ESPRESSO I. What is Quantum ESPRESSO? 2 DFT software PW-DFT, PP, US-PP, PAW http://www.quantum-espresso.org FREE PW-DFT, PP, PAW http://www.abinit.org FREE

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 37, No. 2, pp. C169 C193 c 2015 Society for Industrial and Applied Mathematics FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Basis sets for SIESTA. Emilio Artacho. Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge

Basis sets for SIESTA. Emilio Artacho. Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge Basis sets for SIESTA Emilio Artacho Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge Solving: Basis set Expand in terms of a finite set of basis functions

More information

Jae Heon Yun and Yu Du Han

Jae Heon Yun and Yu Du Han Bull. Korean Math. Soc. 39 (2002), No. 3, pp. 495 509 MODIFIED INCOMPLETE CHOLESKY FACTORIZATION PRECONDITIONERS FOR A SYMMETRIC POSITIVE DEFINITE MATRIX Jae Heon Yun and Yu Du Han Abstract. We propose

More information

Key concepts in Density Functional Theory (II) Silvana Botti

Key concepts in Density Functional Theory (II) Silvana Botti Kohn-Sham scheme, band structure and optical spectra European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Temporary Address:

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Performance optimization of WEST and Qbox on Intel Knights Landing

Performance optimization of WEST and Qbox on Intel Knights Landing Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University

More information

Projector augmented wave Implementation

Projector augmented wave Implementation Projector augmented wave Implementation Peter. E. Blöchl Institute for Theoretical Physics Clausthal University of Technology, Germany http://www.pt.tu-clausthal.de/atp/ 1 = Projector augmented wave +

More information

Lecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method

Lecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method Lecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method 任新国 (Xinguo Ren) 中国科学技术大学量子信息重点实验室 Key Laboratory of Quantum Information, USTC Hefei, 2016.11.11 Recall:

More information

The Linearized Augmented Planewave (LAPW) Method

The Linearized Augmented Planewave (LAPW) Method The Linearized Augmented Planewave (LAPW) Method David J. Singh Oak Ridge National Laboratory E T [ ]=T s [ ]+E ei [ ]+E H [ ]+E xc [ ]+E ii {T s +V ks [,r]} I (r)= i i (r) Need tools that are reliable

More information

Comparison of various abinitio codes used in periodic calculations

Comparison of various abinitio codes used in periodic calculations Comparison of various abinitio codes used in periodic calculations 1 Prof.P. Ravindran, Department of Physics, Central University of Tamil Nadu, India & Center for Materials Science and Nanotechnology,

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

PART II : Least-Squares Approximation

PART II : Least-Squares Approximation PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,

More information

Density Functional Theory. Martin Lüders Daresbury Laboratory

Density Functional Theory. Martin Lüders Daresbury Laboratory Density Functional Theory Martin Lüders Daresbury Laboratory Ab initio Calculations Hamiltonian: (without external fields, non-relativistic) impossible to solve exactly!! Electrons Nuclei Electron-Nuclei

More information

Pseudopotential generation and test by the ld1.x atomic code: an introduction

Pseudopotential generation and test by the ld1.x atomic code: an introduction and test by the ld1.x atomic code: an introduction SISSA and DEMOCRITOS Trieste (Italy) Outline 1 2 3 Spherical symmetry - I The Kohn and Sham (KS) equation is (in atomic units): [ 1 ] 2 2 + V ext (r)

More information

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++ . p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

More information

Accelerating and Scaling Lanczos Diagonalization with GPGPU

Accelerating and Scaling Lanczos Diagonalization with GPGPU Accelerating and Scaling Lanczos Diagonalization with GPGPU Bill Brouwer, Filippo Spiga, Pierre-Yves Taunay, Sreejith GJ Nvidia GTC 2013 Outline Introduction Applications QE FQHE Theory Diagonalization

More information

First example, moments of inertia. I m,n = m i r 2 i m,n. m i r i,m r i,n. Symmetric second rank tensor x. i =1

First example, moments of inertia. I m,n = m i r 2 i m,n. m i r i,m r i,n. Symmetric second rank tensor x. i =1 Eigenvalue Problems Eigenvalue problems arise in many contexts in physics. In matrix form, Ax = x This is somewhat different from our previous SLE, which had the form Ax = b where A, b were assumed known.

More information

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev,

More information

Practical calculations using first-principles QM Convergence, convergence, convergence

Practical calculations using first-principles QM Convergence, convergence, convergence Practical calculations using first-principles QM Convergence, convergence, convergence Keith Refson STFC Rutherford Appleton Laboratory September 18, 2007 Results of First-Principles Simulations..........................................................

More information

MAA507, Power method, QR-method and sparse matrix representation.

MAA507, Power method, QR-method and sparse matrix representation. ,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming

More information

Fundamentals and applications of Density Functional Theory Astrid Marthinsen PhD candidate, Department of Materials Science and Engineering

Fundamentals and applications of Density Functional Theory Astrid Marthinsen PhD candidate, Department of Materials Science and Engineering Fundamentals and applications of Density Functional Theory Astrid Marthinsen PhD candidate, Department of Materials Science and Engineering Outline PART 1: Fundamentals of Density functional theory (DFT)

More information

Introduction to DFTB. Marcus Elstner. July 28, 2006

Introduction to DFTB. Marcus Elstner. July 28, 2006 Introduction to DFTB Marcus Elstner July 28, 2006 I. Non-selfconsistent solution of the KS equations DFT can treat up to 100 atoms in routine applications, sometimes even more and about several ps in MD

More information