Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016
|
|
- Joel Eaton
- 5 years ago
- Views:
Transcription
1 Domain specific libraries Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016
2 Part 1: Introduction
3 Kohn-Shame equations 1 2 Eigen-value problem + v eff (r) j(r) =" j j (r) Effective potential construction Z (r 0 ) v eff (r) = r 0 r dr0 + v XC [ ](r) + v ext (r) Density generation new (r) = X j j (r) 2 Density mixing (r) = new (r)+(1 ) old (r) Output: wave-functions j(r) and eigen energies " j charge density (r) and magnetization m(r) total energy E tot and atomic forces F Derived objects: Spectral functions Linear response functions Phonons, thermodynamic functions Parameters of lattice models
4 Basis expansion Expand Kohn-Sham wave functions in a convenient basis set: j(r) = X µ C µj µ (r) Convert differential equation to an eigen-value problem: HC j = " j OC j H µµ 0 = Z µ(r) O µµ 0 = Z where v eff (r) µ(r) µ 0(r)dr µ 0 (r)dr
5 Electronic structure codes Code Quantum ESPRESSO VASP ABINIT PEtot CPMD QBOX CASTEP SIESTA CRYSTAL FHI-AIMS FPLO OpenMX TB-LMTO CP2K BigDFT WIEN2k FLEUR Exciting Elk GPAW Octopus Basis set plane-waves localized atom-centered orbitals gaussians + plane-waves wavelets linarized augmented plane-waves real space grid / plane waves / atomic orbitals
6 Plane-wave basis Plane-wave with the wave-vector G e igr = e i(xg x+yg y +zg z ) = cos(gr)+i sin(gr) = 2 G Complete orthogonal basis Eigen-functions of the Laplace z e igr = G 2 e igr Fourier transformations can be rapidly evaluated with FFT Real-space integrals between periodic functions are represented as sums Z f (r)g(r)dr = N X f (r j )g(r j )= X j G f (G)g(G)
7 Pseudopotential Plane-wave basis requires a pseudopotential Unlike the true (all-electron) potential, pseudopotential has an additional nonlocal term: ˆV PS = V loc (r)+ X X 0 id 0 h 0 Action of the pseudopotential on the wave-function: ˆV PS (r) =V loc (r) (r)+ X X 0 Z (r)d 0 0 (r) (r)dr
8 Eigen-value problem H j = " j j H j = " j S j in case of norm-conserving pseudopotential in case of ultrasoft pseudopotential H and S are dense Hermitian matrices of large dimension (~1000 PW / atom).
9 Eigen-value problem H j = " j j H j = " j S j in case of norm-conserving pseudopotential in case of ultrasoft pseudopotential H and S are dense Hermitian matrices of large dimension (~1000 PW / atom). Iterative subspace diagonalziation methods are used. Conjugate gradient Davidson algorithm RMM-DIIS LOBPCG Chebyshev filering method
10 Hamiltonian operator application Any iterative solver requires operation Ĥ i
11 Hamiltonian operator application Any iterative solver requires operation Ĥ i Result of Hamiltonian application h j = H j = Z e igr Ĥ j (r)dr
12 Hamiltonian operator application Any iterative solver requires operation Ĥ i Result of Hamiltonian application h j = H j = Z e igr Ĥ j (r)dr Hamiltonian operator with pseudopotential Ĥ = v eff (r)+ X X 0 id 0 h 0
13 Hamiltonian operator application Application of the local part of potential (Vloc kernel) j(g) FFT 1! j (r)! v eff (r) j (r) FFT! h j (G)
14 Hamiltonian operator application Application of the local part of potential (Vloc kernel) j(g) FFT 1! j (r)! v eff (r) j (r) FFT! h j (G) do j = 1, num_psi! transform psi to real-space domain call inverse_fft(psi(:,j), psi_of_r(:))! multiply by effective potential do ir = 1, num_r_points psi_of_r(ir) = psi_of_r(ir) * veff(ir) end do! tranform back to PW domain call forward_fft(psi_of_r(:), hpsi(:,j)) end do Two FFTs for each wave-function!
15 Hamiltonian operator application 1 2 j(r) = X G Application of Laplace operator 1 2 eigr j(g) = X G e igr G2 2 j(g)
16 Hamiltonian operator application 1 2 j(r) = X G Application of Laplace operator 1 2 eigr j(g) = X G e igr G2 2 j(g) do j = 1, num_psi! add kinetic energy contribution do ig = 1, num_gvec hpsi(ig, j) = hpsi(ig, j) + pw_ekin(ig) * psi(ig, j) end do end do
17 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G)
18 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G)! compute <beta psi> call zgemm('c', 'N', num_beta_tot, num_psi, num_pw, beta_pw, psi, beta_psi)! apply block-diagonal D matrix do ia = 1, num_atoms call zgemm('n', 'N', num_beta(ia), num_psi, num_beta(ia), D(:, :, ia), beta_psi(offset(ia), 1), d_beta_psi(offset(ia), 1)) end do! compute beta*d*<beta psi> call zgemm('n', 'N', num_pw, num_psi, num_beta_tot, beta_pw, d_beta_psi, hpsi)
19 Hamiltonian operator application X Application of non-local part of potential: (G) X 0 D 0 X G 0 zgemm#1 zgemm#2 zgemm#3 X 0 id 0h 0 0 (G 0 ) j(g 0 )! h j (G) plane-wave index G h j (G) j += index of atom (G) and orbital D 0 D 1 D 2 D 3 0 (G 0 ) j(g) j
20 Davidson iterative solver We want to compute H j = " j j We know how to compute h j = H j
21 Davidson iterative solver We want to compute H j = " j j We know how to compute h j = H j Key idea of Davidson iterative solver: start with a subspace spanned by a guess to and expand it with preconditioned residuals j
22 Davidson iterative solver 0 Initialize the trial basis set: j ( j
23 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j
24 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G m j (G) h m (G) j 0
25 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: h m Z m = j Z m m j (G) h m (G) j 0
26 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: h m Z m = j Z m m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j
27 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j Apply preconditioner to the unconverged residuals, orthogonalize and add the resulting functions to the basis: h m Z m = j Z m { m+1 j } = { m j } M {P R m j }
28 Davidson iterative solver 0 Initialize the trial basis set: j ( j Apply Hamiltonian to the basis functions: h m j = H m j Compute reduced Hamiltonian matrix: hm jj 0 = X G Diagonalize reduced Hamiltonian matrix and get N lowest eigen pairs: m j (G) h m (G) j 0 Compute residuals ( ): Rm j = h m j Z m m R j j Z m j = Ĥ j j j Apply preconditioner to the unconverged residuals, orthogonalize and add the resulting functions to the basis: h m Z m = j Z m { m+1 j } = { m j } M {P R m j } Recompute the wave-functions: j = m j Z m
29 Davidson iterative solver Initialize subspace basis functions and apply Hamiltonian 0 j h 0j = H 0 j plane-wave index G
30 Davidson iterative solver Compute reduced Hamiltonian matrix h 0 jj 0 0 j h 0j = Compute eigen-value probelem = " j
31 Davidson iterative solver Compute residuals R 0 j h 0 j Z 0 Z 0 j jj 0 0 j = - Apply preconditioner: P R 0 j (G) =(H GG " j ) 1 R0 j (G)
32 Davidson iterative solver Expand variational space and apply Hamiltonian to new basis functions 1 j = 0 j M P R0 j h 1 j = H 1 j plane-wave index G
33 Davidson iterative solver Compute reduced Hamiltonian matrix h 1 jj 0 1 j h1 j = Compute eigen-value probelem = " j
34 Davidson iterative solver Compute residuals R 1 j h 1 j Z 1 Z 1 j jj 0 1 j = - Apply preconditioner: P R 1 j (G) =(H GG " j ) 1 R1 j (G)
35 Davidson iterative solver Continue to expand variational space 2 j = 1 j M P R1 j h 2j = H 2 j plane-wave index G
36 Davidson iterative solver Continue to expand variational space 2 j = 1 j M P R1 j h 2j = H 2 j plane-wave index G Iterate until the convergence (all residuals are zero) is reached
37 Davidson iterative solver Recompute the wave-functions m j j = Z m Take " j from the last subspace diagonalization
38 Part 2: Implementation details
39 Common operations FFTs (Vloc kernel, density summation) Subspace Hamiltonian construction = Wave-functions and residuals update = Wave-function orthogonalization
40 Common operations FFTs (Vloc kernel, density summation) Subspace Hamiltonian construction = Wave-functions and residuals update = Wave-function orthogonalization Need scalable parallel implementation!
41 Wave-function storage and distribution Wave-functions i(g) represents a single wave-function. are stored as a matrix: each column of the matrix band index i Number of bands: ~100 10K plane-wave index G Number of plane-waves: ~100K 10M
42 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split bands Split G-vectors Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G
43 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split bands Split G-vectors Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G
44 Wave-function storage and distribution How to distribute the matrix of plane-wave coefficients? Split G-vectors Split bands band index i Split bands and G-vectors 2D block-cyclic distribution band index i band index i band index i plane-wave index G plane-wave index G plane-wave index G plane-wave index G
45 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G)
46 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G) Wave-functions in the plane-wave domain are defined inside a sphere with a given energy cutoff: G 2 max i(g) 6= 0 for G apple G max ; = E cut 2
47 FFT kernel do i=1, Nb enddo i(g) FFT 1! i(r)! i(r) V loc (r) FFT! [ iv ](G) Wave-functions in the plane-wave domain are defined inside a sphere with a given energy cutoff: G 2 max i(g) 6= 0 for G apple G max ; = E cut 2 FFT box for the Vloc kernel typically circumscribes a sphere of radius 2Gmax. z x y
48 FFT decomposition to 1D and 2D The full 3D transformation is decomposed into a 1D transformation along the z-direction and a 2D transformation in the xy-plane: (G x,g y,g z ) (G x,g y,z) (x, y, z) FFT -1 (z) FFT(z) FFT -1 (xy) FFT(xy)
49 FFT parallelization
50 FFT parallelization In real space z-dimension of the FFT buffer is split among MPI ranks MPI rank#0 z MPI rank#1 MPI rank#2 y x
51 FFT parallelization In real space z-dimension of the FFT buffer is split among MPI ranks In reciprocal space z-sticks of the G-vector sphere are distributed among MPI ranks such that: local number of z-sticks is roughly equivalent sum of lengths of z-sticks (local number of G- vectors) is roughly equivalent MPI rank#0 z MPI rank#1 MPI rank#2 y x
52 Parallel transformation of z-sticks FFT -1 (z) FFT(z)
53 Parallel transformation of z-sticks FFT -1 (z) FFT(z) each rank executes 1D transformations of the local fraction of z-sticks FFT -1 (z) p0 p1 p2 FFT(z) p0 p1 p2 {full-z, partial-xy} distribution of a cylinder
54 Parallel transformation of z-sticks FFT -1 (z) FFT(z) each rank executes 1D transformations of the local fraction of z-sticks z-sticks are swapped between MPI ranks using MPI_Alltoall p0 p0 p1 p2 FFT -1 (z) FFT(z) MPI_Alltoall p1 p2 p0 p1 p2 {full-z, partial-xy} distribution of a cylinder {partial-z, full-xy} distribution of a cylinder
55 Data redistribution Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G
56 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G
57 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Remap wave-functions to a 2D MPI grid where one dimension is dedicated to a parallel FFT and second dimension is used to distribute wave-function band index Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G 2D MPI grid of Nr x Nc ranks p0 p1 p2 p3 p4 p5 p6 p7
58 Data redistribution Using all available MPI ranks for a single FFT is not efficient! Remap wave-functions to a 2D MPI grid where one dimension is dedicated to a parallel FFT and second dimension is used to distribute wave-function band index Np ranks band index i p0 p1 p2 p3 p4 p5 p6 p7 plane-wave index G MPI A2A between Nc ranks MPI A2A between Nc ranks 2D MPI grid of Nr x Nc ranks p0 p1 p2 p3 p4 p5 p6 p7 Groups of Nc ranks contain the fat slab of plane-wave coefficients for all bands
59 FFT Vloc kernel Change wave-function distribution from slab to 2D grid (MPI_A2A between Nc ranks) Execute 1D backward FFTs along z for the local set of G-vector sticks swap z-columns (MPI_A2A between Nr ranks) execute 2D backward FFTs in the xy-plane multiply by effective potential execute 2D forward FFTs in the xy-plane swap z-columns (MPI_A2A between Nr ranks) Execute 1D forward FFTs along z for the local set of z-sticks Change wave-function distribution from 2D grid to slab (MPI_A2A between Nc ranks)
60 FFT kernel benchmark on Piz Daint Number of bands is 3000, number of times Vloc is applied is 4, number of G-vectors is ~9.2M, FFT grid size is Total Vloc time Total Vloc efficiency FFT time FFT efficiency 1 Time (sec.) Number of nodes Efficiency
61 Inner product O ij = X G i (G) j(g)
62 Inner product O ij = X G i (G) j(g) Slab data distribution of band index i i(g) plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3
63 Inner product Slab data distribution of band index i O ij = X G i(g) i (G) j(g) 2D block-cyclic distribution of Oij band index j plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3 band index i
64 Inner product: simple implementation
65 Inner product: simple implementation 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) G j i MPI rank#0 MPI rank#1
66 Inner product: simple implementation 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij j i + + = MPI rank#0 MPI rank#1
67 Inner product: simple implementation j 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij 3. Each rank picks the local fraction of matrix elements from Oij i + + = MPI rank#0 MPI rank#1
68 Inner product: simple implementation j 1. Each rank computes a contribution to the global matrix p NGX O p ij = i (G) j(g) 2. MPI_Allreduce of global matrix is performed N X p O ij = G p=1 O p ij 3. Each rank picks the local fraction of matrix elements from Oij i + + = MPI rank#0 MPI rank#1 Requires MPI_Allreduce of global matrix! Each MPI rank receives a lot of unnecessary data!
69 Inner product: reduce-to-one implementation
70 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G jc ir MPI rank#0 MPI rank#1 MPI rank#2
71 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G 2. MPI_Reduce of local panel is performed to (pr, pc) Cartesian rank N X p O ir j c = p=1 O p i r j c jc ir MPI rank#0 MPI rank#1 MPI rank#2
72 Inner product: reduce-to-one implementation 1. Each rank computes a contribution to the local panel of (pr, pc) Cartesian rank NGX p O p i r j c = i r (G) j c (G) G 2. MPI_Reduce of local panel is performed to (pr, pc) Cartesian rank N X p O ir j c = p=1 O p i r j c 3. Steps 1-2 are repeated for all Cartesians rank of a 2D BLACS grid jc ir MPI rank#0 MPI rank#1 MPI rank#2
73 Inner product: reduce-to-one implementation synchronous zgemm MPI_Reduce asynchronous MPI_Wait zgemm MPI_Ireduce
74 Inner product: reduce-to-one implementation synchronous zgemm MPI_Reduce asynchronous MPI_Wait zgemm MPI_Ireduce A lot of MPI_Reduce! The resulting sub-matrices are small!
75 Inner product: block-allreduce implementation 1. Each rank computes a contribution to the block of global matrix O p i b j b = N p GX G i b (G) j b (G) 2. MPI_Allreduce of block is performed O ib j b = N p X p=1 O p i b j b 3. Each rank picks the local fraction of matrix elements from block O ib j b b=0 b=1 b=2 b=3 b=4 b=5 b=6 b=7 b=8
76 Inner product: block-allreduce implementation 1. Each rank computes a contribution to the block of global matrix O p i b j b = N p GX G i b (G) j b (G) 2. MPI_Allreduce of block is performed O ib j b = N p X p=1 O p i b j b 3. Each rank picks the local fraction of matrix elements from block O ib j b b=0 b=1 b=2 b=3 b=4 b=5 4. Steps 1-3 are repeated for all blocks b=6 b=7 b=8
77 Inner product: block-allreduce implementation synchronous zgemm MPI_Allreduce update local panels asynchronous MPI_Wait update local panels zgemm MPI_Ireduce overlap computation and communication single OMP thread is executing MPI_Allreduce and update of local panels remaining OMP threads execute zgemm
78 Example: communication and computation overlap omp_set_nested(1); /* allow nested OMP */ int nt = omp_get_max_threads(); /* get total number of threds */ #pragma omp parallel num_threads(2) shared(buf_state) /* spawn two threds */ { if (omp_get_thread_num() == 1) { /* thread #1 executes zgemms */ int s{0}; omp_set_num_threads(nt - 1); /* set number of nested threads to nt - 1 */ for (...) { /* loop over blocks */ int state = 1; while (state == 1) { /* wait for the release of the buffer */ #pragma omp atomic read state = buf_state[s % 2]; /* read from shared varray */ } zgemm(...,buf[s % 2]); /* execute local zgemm */ #pragma omp atomic write buf_state[s % 2] = 1; /* lock the buffer */ s++; } } else { /* thread #0 is doing communication */ int s{0}; for (...) { /* loop over blocks */ int state = 0; while (state == 0) { /* wait for the lock of the buffer */ #pragma omp atomic read state = buf_state[s % 2]; /* read from shared array */ } MPI_Allreduce(...,buf[s % 2]); /* execute global reduction */ /* do something with the result, for example, store it */ #pragma omp atomic write buf_state[s % 2] = 0; /* release the buffer */ s++; } } } omp_set_nested(0); /* forbid nested */
79 Inner product kernel benchmark Piz Daint, Intel SB (8 CPU cores) M=1 000 N=1 000 K=
80 Inner product kernel benchmark Piz Daint, Intel SB (8 CPU cores) M= N= K=
81 Wave-function transformation i(g) = X j j(g)z ji
82 Wave-function transformation i(g) = X j j(g)z ji Slab data distribution of and i(g) j(g) wave-function index plane-wave index G MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3
83 Wave-function transformation plane-wave index G Slab data distribution of and i(g) j(g) wave-function index MPI rank #0 MPI rank #1 MPI rank #2 MPI rank #3 i(g) = X j j(g)z ji 2D block-cyclic distribution of Zji band index i basis function index j
84 Wave-function transformation i0 i1 For each block of the transformation matrix: each MPI rank packs its contribution to the block elements of block are collected with MPI_Allgatherv elements are unpacked and the whole block is formed the updated of the wave-functions i 0 :i 1 (G) from the wave-functions j 0 :j 1 (G) is done by the local zgemm j0 j1
85 Wave-function transformation i0 i1 j0 j1 i0 i1 = j0 i(g) j(g) j1
86 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0
87 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0 Orthogonalize to the old basis functions ii = 1 X j 0 jih 0 j new i i = new i i X j 0 jih 0 j new i i inner product wave-function transformation
88 Wave-function orthogonalization For the orthonormal basis { 0 j}, h 0 j functions { new j }, { 0 j} M { new j } = { 1 j} Orthogonalize to the old basis functions ii = 1 X Orthonormalize new functions Oii0 j 0 jih 0 j 0 j 0i = jj 0 the task is to expand it with the new such that h 1 1 j j 0i = jj 0 new i = h i i0i i = new i i X j inner product 0 jih 0 j new i i inner product wave-function transformation O = LL H Cholesky decomposition new i i = X 1 i0ili 0 i i 0 wave-function transformation
89 Wave-function orthogonalization Test case: orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [1,N] orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [N +1, 2N] w.r.t to the first N wave-functions Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K=
90 Wave-function orthogonalization Test case: orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [1,N] orthogonalization and transformation of i i, Ĥ i i, Ŝ ii for i 2 [N +1, 2N] w.r.t to the first N wave-functions Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K= Average node performance Efficiency Performance (GFlop/s) Efficiency Number of nodes
91 Wave-function orthogonalization Test case: orthogonalization and transformation of orthogonalization and transformation of w.r.t to the first N wave-functions i i, H i i, i i, H i i, S i i for i 2 [N + 1, 2N ] S ii for i 2 [1, N ] 0 0 j S j 0 i Transformations: 0 1 j il H 0 1 j il S h 0 1 j S j 0 i Transformations: 1 ii H 0 1 j il X j 1 ii S j ih j S i i X j 1 ii H j ih j S i i X j S j ih j S i i h Cholesky factorization and inversion (CPU) h Cholesky factorization and inversion (CPU) Upgraded Piz Daint, Intel HW (12 CPU cores) + P100 GPU, N=4096 K= Transformations: 1 1 j S j 0 i 1 1 j il H 1 1 j il S 1 1 j il
92 Conclusion Domain specific libraries allow a separation of concerns between HPC and scientific communities Several compute-intensive kernels that are one level above the standard BLAS/LAPCK/ScaLAPACK/FFTW were identified MaX centre of excellence is working on the initial DSL implementation and API (WP4)
93 Thank you for your attention.
VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria
VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r
More informationThe electronic structure of materials 2 - DFT
Quantum mechanics 2 - Lecture 9 December 19, 2012 1 Density functional theory (DFT) 2 Literature Contents 1 Density functional theory (DFT) 2 Literature Historical background The beginnings: L. de Broglie
More informationELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS
FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc
More informationDGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations
DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations The recently developed discontinuous Galerkin density functional theory (DGDFT)[21] aims at reducing the number
More informationComparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience
Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham
More informationSelf Consistent Cycle
Self Consistent Cycle Step 0 : defining your system namelist SYSTEM How to specify the System All periodic systems can be specified by a Bravais Lattice and and atomic basis How to specify the Bravais
More informationWeile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3
A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun
More informationIntroduction to Parallelism in CASTEP
to ism in CASTEP Stewart Clark Band University of Durham 21 September 2012 Solve for all the bands/electrons (Band-) Band CASTEP solves the Kohn-Sham equations for electrons in a periodic array of nuclei:
More informationDensity Functional Theory
Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using
More informationDept of Mechanical Engineering MIT Nanoengineering group
1 Dept of Mechanical Engineering MIT Nanoengineering group » Recap of HK theorems and KS equations» The physical meaning of the XC energy» Solution of a one-particle Schroedinger equation» Pseudo Potentials»
More informationELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional
More information6: Plane waves, unit cells, k- points and all that
The Nuts and Bolts of First-Principles Simulation 6: Plane waves, unit cells, k- points and all that Durham, 6th- 13th December 2001 CASTEP Developers Group with support from the ESF ψ k Network Overview
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationCP2K: the gaussian plane wave (GPW) method
CP2K: the gaussian plane wave (GPW) method Basis sets and Kohn-Sham energy calculation R. Vuilleumier Département de chimie Ecole normale supérieure Paris Tutorial CPMD-CP2K CPMD and CP2K CPMD CP2K http://www.cpmd.org
More informationPreconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005
1 Preconditioned Eigenvalue Solvers for electronic structure calculations Andrew V. Knyazev Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver Householder
More informationParallel Programming. Parallel algorithms Linear systems solvers
Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric
More informationBlock Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems
Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals
More informationThe Plane-Wave Pseudopotential Method
Hands-on Workshop on Density Functional Theory and Beyond: Computational Materials Science for Real Materials Trieste, August 6-15, 2013 The Plane-Wave Pseudopotential Method Ralph Gebauer ICTP, Trieste
More informationPlane waves, pseudopotentials and PAW. X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium
Plane waves, pseudopotentials and PAW X. Gonze Université catholique de Louvain, Louvain-la-neuve, Belgium 1 Basic equations in DFT Solve self-consistently the Kohn-Sham equation H ψ n = ε n ψ n!!! ρ(r
More informationPoisson Solver, Pseudopotentials, Atomic Forces in the BigDFT code
CECAM Tutorial on Wavelets in DFT, CECAM - LYON,, in the BigDFT code Kernel Luigi Genovese L_Sim - CEA Grenoble 28 November 2007 Outline, Kernel 1 The with Interpolating Scaling Functions in DFT for Interpolating
More informationKey concepts in Density Functional Theory (II)
Kohn-Sham scheme and band structures European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Present Address: LPMCN Université
More informationScalable Non-blocking Preconditioned Conjugate Gradient Methods
Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William
More informationThe Plane-wave Pseudopotential Method
The Plane-wave Pseudopotential Method k(r) = X G c k,g e i(g+k) r Chris J Pickard Electrons in a Solid Nearly Free Electrons Nearly Free Electrons Nearly Free Electrons Electronic Structures Methods Empirical
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More information5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)
5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h
More informationElectronic structure, plane waves and pseudopotentials
Electronic structure, plane waves and pseudopotentials P.J. Hasnip Spectroscopy Workshop 2009 We want to be able to predict what electrons and nuclei will do from first principles, without needing to know
More informationESLW_Drivers July 2017
ESLW_Drivers 10-21 July 2017 Volker Blum - ELSI Viktor Yu - ELSI William Huhn - ELSI David Lopez - Siesta Yann Pouillon - Abinit Micael Oliveira Octopus & Abinit Fabiano Corsetti Siesta & Onetep Paolo
More informationChapter 3. The (L)APW+lo Method. 3.1 Choosing A Basis Set
Chapter 3 The (L)APW+lo Method 3.1 Choosing A Basis Set The Kohn-Sham equations (Eq. (2.17)) provide a formulation of how to practically find a solution to the Hohenberg-Kohn functional (Eq. (2.15)). Nevertheless
More informationab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationWhy use pseudo potentials?
Pseudo potentials Why use pseudo potentials? Reduction of basis set size effective speedup of calculation Reduction of number of electrons reduces the number of degrees of freedom For example in Pt: 10
More information9.1 Preconditioned Krylov Subspace Methods
Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete
More informationInstitut Néel Institut Laue Langevin. Introduction to electronic structure calculations
Institut Néel Institut Laue Langevin Introduction to electronic structure calculations 1 Institut Néel - 25 rue des Martyrs - Grenoble - France 2 Institut Laue Langevin - 71 avenue des Martyrs - Grenoble
More informationCHEM6085: Density Functional Theory
Lecture 11 CHEM6085: Density Functional Theory DFT for periodic crystalline solids C.-K. Skylaris 1 Electron in a one-dimensional periodic box (in atomic units) Schrödinger equation Energy eigenvalues
More informationThe Fast Multipole Method in molecular dynamics
The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules
More informationPseudopotential methods for DFT calculations
Pseudopotential methods for DFT calculations Lorenzo Paulatto Scuola Internazionale Superiore di Studi Avanzati and CNR-INFM DEMOCRITOS National Simulation Center Tieste Italy July 9, 2008 Outline pseudopotential
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationLab 1: Iterative Methods for Solving Linear Systems
Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as
More informationMassive Parallelization of First Principles Molecular Dynamics Code
Massive Parallelization of First Principles Molecular Dynamics Code V Hidemi Komatsu V Takahiro Yamasaki V Shin-ichi Ichikawa (Manuscript received April 16, 2008) PHASE is a first principles molecular
More informationIntroduction to First-Principles Method
Joint ICTP/CAS/IAEA School & Workshop on Plasma-Materials Interaction in Fusion Devices, July 18-22, 2016, Hefei Introduction to First-Principles Method by Guang-Hong LU ( 吕广宏 ) Beihang University Computer
More informationAccuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation
Accuracy benchmarking of DFT results, domain libraries for electrostatics, hybrid functional and solvation Stefan Goedecker Stefan.Goedecker@unibas.ch http://comphys.unibas.ch/ Multi-wavelets: High accuracy
More informationParallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1
Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationQuantum ESPRESSO Performance Benchmark and Profiling. February 2017
Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster
More informationPractical Guide to Density Functional Theory (DFT)
Practical Guide to Density Functional Theory (DFT) Brad Malone, Sadas Shankar Quick recap of where we left off last time BD Malone, S Shankar Therefore there is a direct one-to-one correspondence between
More informationA knowledge-based approach to high-performance computing in ab initio simulations.
Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration
More informationElectronic Structure Calculations, Density Functional Theory and its Modern Implementations
Tutoriel Big RENOBLE Electronic Structure Calculations, Density Functional Theory and its Modern Implementations Thierry Deutsch L_Sim - CEA renoble 19 October 2011 Outline 1 of Atomistic calculations
More information7.2 Steepest Descent and Preconditioning
7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationLarge Scale Electronic Structure Calculations
Large Scale Electronic Structure Calculations Jürg Hutter University of Zurich 8. September, 2008 / Speedup08 CP2K Program System GNU General Public License Community Developers Platform on "Berlios" (cp2k.berlios.de)
More informationFine-grained Parallel Incomplete LU Factorization
Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationFine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning
Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,
More informationCyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions
Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Edgar Solomonik 1, Devin Matthews 3, Jeff Hammond 4, James Demmel 1,2 1 Department of
More informationConjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures
Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures Stanimire Tomov 1, Julien Langou 1, Andrew Canning 2, Lin-Wang Wang 2, and Jack Dongarra 1 1 Innovative
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationBoundary Value Problems - Solving 3-D Finite-Difference problems Jacob White
Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Reminder about
More informationNumerical Linear and Multilinear Algebra in Quantum Tensor Networks
Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Konrad Waldherr October 20, 2013 Joint work with Thomas Huckle QCCC 2013, Prien, October 20, 2013 1 Outline Numerical (Multi-) Linear
More information1. Hydrogen atom in a box
1. Hydrogen atom in a box Recall H atom problem, V(r) = -1/r e r exact answer solved by expanding in Gaussian basis set, had to solve secular matrix involving matrix elements of basis functions place atom
More informationElectron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals
Electron bands in crystals Pseudopotentials, Plane Waves, Local Orbitals Richard M. Martin UIUC Lecture at Summer School Hands-on introduction to Electronic Structure Materials Computation Center University
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationCarlo Cavazzoni, HPC department, CINECA
Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationWavelets for density functional calculations: Four families and three. applications
Wavelets for density functional calculations: Four families and three Haar wavelets Daubechies wavelets: BigDFT code applications Stefan Goedecker Stefan.Goedecker@unibas.ch http://comphys.unibas.ch/ Interpolating
More informationPseudopotentials for hybrid density functionals and SCAN
Pseudopotentials for hybrid density functionals and SCAN Jing Yang, Liang Z. Tan, Julian Gebhardt, and Andrew M. Rappe Department of Chemistry University of Pennsylvania Why do we need pseudopotentials?
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationMODULE 2: QUANTUM MECHANICS. Practice: Quantum ESPRESSO
MODULE 2: QUANTUM MECHANICS Practice: Quantum ESPRESSO I. What is Quantum ESPRESSO? 2 DFT software PW-DFT, PP, US-PP, PAW http://www.quantum-espresso.org FREE PW-DFT, PP, PAW http://www.abinit.org FREE
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationParallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems
Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.
More informationc 2015 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 37, No. 2, pp. C169 C193 c 2015 Society for Industrial and Applied Mathematics FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationBasis sets for SIESTA. Emilio Artacho. Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge
Basis sets for SIESTA Emilio Artacho Nanogune, Ikerbasque & DIPC, San Sebastian, Spain Cavendish Laboratory, University of Cambridge Solving: Basis set Expand in terms of a finite set of basis functions
More informationJae Heon Yun and Yu Du Han
Bull. Korean Math. Soc. 39 (2002), No. 3, pp. 495 509 MODIFIED INCOMPLETE CHOLESKY FACTORIZATION PRECONDITIONERS FOR A SYMMETRIC POSITIVE DEFINITE MATRIX Jae Heon Yun and Yu Du Han Abstract. We propose
More informationKey concepts in Density Functional Theory (II) Silvana Botti
Kohn-Sham scheme, band structure and optical spectra European Theoretical Spectroscopy Facility (ETSF) CNRS - Laboratoire des Solides Irradiés Ecole Polytechnique, Palaiseau - France Temporary Address:
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationPerformance optimization of WEST and Qbox on Intel Knights Landing
Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University
More informationProjector augmented wave Implementation
Projector augmented wave Implementation Peter. E. Blöchl Institute for Theoretical Physics Clausthal University of Technology, Germany http://www.pt.tu-clausthal.de/atp/ 1 = Projector augmented wave +
More informationLecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method
Lecture on First-principles Computations (14): The Linear Combination of Atomic Orbitals (LCAO) Method 任新国 (Xinguo Ren) 中国科学技术大学量子信息重点实验室 Key Laboratory of Quantum Information, USTC Hefei, 2016.11.11 Recall:
More informationThe Linearized Augmented Planewave (LAPW) Method
The Linearized Augmented Planewave (LAPW) Method David J. Singh Oak Ridge National Laboratory E T [ ]=T s [ ]+E ei [ ]+E H [ ]+E xc [ ]+E ii {T s +V ks [,r]} I (r)= i i (r) Need tools that are reliable
More informationComparison of various abinitio codes used in periodic calculations
Comparison of various abinitio codes used in periodic calculations 1 Prof.P. Ravindran, Department of Physics, Central University of Tamil Nadu, India & Center for Materials Science and Nanotechnology,
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationPART II : Least-Squares Approximation
PART II : Least-Squares Approximation Basic theory Let U be an inner product space. Let V be a subspace of U. For any g U, we look for a least-squares approximation of g in the subspace V min f V f g 2,
More informationDensity Functional Theory. Martin Lüders Daresbury Laboratory
Density Functional Theory Martin Lüders Daresbury Laboratory Ab initio Calculations Hamiltonian: (without external fields, non-relativistic) impossible to solve exactly!! Electrons Nuclei Electron-Nuclei
More informationPseudopotential generation and test by the ld1.x atomic code: an introduction
and test by the ld1.x atomic code: an introduction SISSA and DEMOCRITOS Trieste (Italy) Outline 1 2 3 Spherical symmetry - I The Kohn and Sham (KS) equation is (in atomic units): [ 1 ] 2 2 + V ext (r)
More informationNumerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
. p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
More informationAccelerating and Scaling Lanczos Diagonalization with GPGPU
Accelerating and Scaling Lanczos Diagonalization with GPGPU Bill Brouwer, Filippo Spiga, Pierre-Yves Taunay, Sreejith GJ Nvidia GTC 2013 Outline Introduction Applications QE FQHE Theory Diagonalization
More informationFirst example, moments of inertia. I m,n = m i r 2 i m,n. m i r i,m r i,n. Symmetric second rank tensor x. i =1
Eigenvalue Problems Eigenvalue problems arise in many contexts in physics. In matrix form, Ax = x This is somewhat different from our previous SLE, which had the form Ax = b where A, b were assumed known.
More informationRecent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method LOBPCG Knyazev,
More informationPractical calculations using first-principles QM Convergence, convergence, convergence
Practical calculations using first-principles QM Convergence, convergence, convergence Keith Refson STFC Rutherford Appleton Laboratory September 18, 2007 Results of First-Principles Simulations..........................................................
More informationMAA507, Power method, QR-method and sparse matrix representation.
,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming
More informationFundamentals and applications of Density Functional Theory Astrid Marthinsen PhD candidate, Department of Materials Science and Engineering
Fundamentals and applications of Density Functional Theory Astrid Marthinsen PhD candidate, Department of Materials Science and Engineering Outline PART 1: Fundamentals of Density functional theory (DFT)
More informationIntroduction to DFTB. Marcus Elstner. July 28, 2006
Introduction to DFTB Marcus Elstner July 28, 2006 I. Non-selfconsistent solution of the KS equations DFT can treat up to 100 atoms in routine applications, sometimes even more and about several ps in MD
More information