HECToR CSE technical meeting, Oxford Parallel Algorithms for the Materials Modelling code CRYSTAL

Similar documents
Advanced Quantum Chemistry III: Part 3. Haruyuki Nakano. Kyushu University

Quantum Mechanical Simulations

DFT calculations of NMR indirect spin spin coupling constants

An Approximate DFT Method: The Density-Functional Tight-Binding (DFTB) Method

Density Functional Theory - II part

Computational Methods. Chem 561

Institut Néel Institut Laue Langevin. Introduction to electronic structure calculations

Teoría del Funcional de la Densidad (Density Functional Theory)

Ab-initio Electronic Structure Calculations β and γ KNO 3 Energetic Materials

Electron Correlation

Exchange Correlation Functional Investigation of RT-TDDFT on a Sodium Chloride. Dimer. Philip Straughn

The Schrödinger equation for many-electron systems

DENSITY FUNCTIONAL THEORY FOR NON-THEORISTS JOHN P. PERDEW DEPARTMENTS OF PHYSICS AND CHEMISTRY TEMPLE UNIVERSITY

Computational Chemistry I

Introduction to Computational Quantum Chemistry: Theory

Self-Consistent Implementation of Self-Interaction Corrected DFT and of the Exact Exchange Functionals in Plane-Wave DFT

Electronic band structure, sx-lda, Hybrid DFT, LDA+U and all that. Keith Refson STFC Rutherford Appleton Laboratory

Introduction to Density Functional Theory

Oslo node. Highly accurate calculations benchmarking and extrapolations

Multi-reference Density Functional Theory. COLUMBUS Workshop Argonne National Laboratory 15 August 2005

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

Density Functional Theory for Electrons in Materials

Same idea for polyatomics, keep track of identical atom e.g. NH 3 consider only valence electrons F(2s,2p) H(1s)

Density Func,onal Theory (Chapter 6, Jensen)

Introduction to Computational Chemistry Computational (chemistry education) and/or. (Computational chemistry) education

Orbital dependent correlation potentials in ab initio density functional theory

Finite-Temperature Hartree-Fock Exchange and Exchange- Correlation Free Energy Functionals. Travis Sjostrom. IPAM 2012 Workshop IV

Density Functional Theory. Martin Lüders Daresbury Laboratory

OVERVIEW OF QUANTUM CHEMISTRY METHODS

Density Functional Theory

Modified Becke-Johnson (mbj) exchange potential

Introduction to Computational Chemistry: Theory

Computational Modeling Software and their applications

Introduction to Density Functional Theory

DFT: Exchange-Correlation

CLIMBING THE LADDER OF DENSITY FUNCTIONAL APPROXIMATIONS JOHN P. PERDEW DEPARTMENT OF PHYSICS TEMPLE UNIVERSITY PHILADELPHIA, PA 19122

1 Density functional theory (DFT)

Basics of DFT. Kieron Burke and Lucas Wagner. Departments of Physics and of Chemistry, University of California, Irvine, CA 92697, USA

How to generate a pseudopotential with non-linear core corrections

ABC of ground-state DFT

Module 6 1. Density functional theory

2.1 Introduction: The many-body problem

Introduction to DFTB. Marcus Elstner. July 28, 2006

Fast and accurate Coulomb calculation with Gaussian functions

Bert de Groot, Udo Schmitt, Helmut Grubmüller

DFT: Exchange-Correlation

Study of Ozone in Tribhuvan University, Kathmandu, Nepal. Prof. S. Gurung Central Department of Physics, Tribhuvan University, Kathmandu, Nepal

Electronic structure theory: Fundamentals to frontiers. 2. Density functional theory

Exchange-Correlation Functional

Density functional theory in the solid state

Density Functional Theory

2 Electronic structure theory

Supporting Information for. Predicting the Stability of Fullerene Allotropes Throughout the Periodic Table MA 02139

Electronic Supplementary Information

Chem 4502 Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi. Lecture 28, December 08, 2014

AB INITIO MODELING OF ALKALI METAL CHALCOGENIDES USING SOGGA THEORY

Spring College on Computational Nanoscience May Variational Principles, the Hellmann-Feynman Theorem, Density Functional Theor

ASSESSMENT OF DFT METHODS FOR SOLIDS

Introduction to density-functional theory. Emmanuel Fromager

Short Course on Density Functional Theory and Applications VII. Hybrid, Range-Separated, and One-shot Functionals

Session 1. Introduction to Computational Chemistry. Computational (chemistry education) and/or (Computational chemistry) education

This is a very succinct primer intended as supplementary material for an undergraduate course in physical chemistry.

One-Electron Hamiltonians

Quantum Chemical and Dynamical Tools for Solving Photochemical Problems

QMC dissociation energy of the water dimer: Time step errors and backflow calculations

Electronic Structure Calculations and Density Functional Theory

Density Functional Theory

MO Calculation for a Diatomic Molecule. /4 0 ) i=1 j>i (1/r ij )

Calculations of band structures

NWChem: Hartree-Fock, Density Functional Theory, Time-Dependent Density Functional Theory

Electrochemistry project, Chemistry Department, November Ab-initio Molecular Dynamics Simulation

Answers Quantum Chemistry NWI-MOL406 G. C. Groenenboom and G. A. de Wijs, HG00.307, 8:30-11:30, 21 jan 2014

Lecture 9. Hartree Fock Method and Koopman s Theorem

Parallel Eigensolver Performance on High Performance Computers

XYZ of ground-state DFT

7/29/2014. Electronic Structure. Electrons in Momentum Space. Electron Density Matrices FKF FKF. Ulrich Wedig


Ab initio calculations for potential energy surfaces. D. Talbi GRAAL- Montpellier

Electronic structure calculations: fundamentals George C. Schatz Northwestern University

Introduction to First-Principles Method

Lecture 8: Introduction to Density Functional Theory

Post Hartree-Fock: MP2 and RPA in CP2K

Basic introduction of NWChem software

Density Functional Theory: from theory to Applications

Set the initial conditions r i. Update neighborlist. Get new forces F i

Gaussian Basis Sets for Solid-State Calculations

Structure of Cement Phases from ab initio Modeling Crystalline C-S-HC

Computer Laboratory DFT and Frequencies

Density Functional Theory (DFT)

Reactivity and Organocatalysis. (Adalgisa Sinicropi and Massimo Olivucci)

Algorithms and Computational Aspects of DFT Calculations

Al 2 O 3. Al Barrier Layer. Pores

Translation Symmetry, Space Groups, Bloch functions, Fermi energy

Intermediate DFT. Kieron Burke and Lucas Wagner. Departments of Physics and of Chemistry, University of California, Irvine, CA 92697, USA

DFT and beyond: Hands-on Tutorial Workshop Tutorial 1: Basics of Electronic Structure Theory

MBPT and TDDFT Theory and Tools for Electronic-Optical Properties Calculations in Material Science

ABC of ground-state DFT

Introduction to DFT and Density Functionals. by Michel Côté Université de Montréal Département de physique

Time-Dependent Density-Functional Theory

Transcription:

HECToR CSE technical meeting, Oxford 2009 Parallel Algorithms for the Materials Modelling code CRYSTAL Dr Stanko Tomi Computational Science & Engineering Department, STFC Daresbury Laboratory, UK

Acknowledgements Dr Ian Bush (NAG) Dr Barry Searle (DL-STFC) Prof Nic Harrison (Imperial-DL-STFC) dcse-epsrc Initiative

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

Motivation for using highly parallel CRYSTAL code a) Miniaturisation of devices and advance in current technology, requires material description at atomistic level and detailed quantum mechanical knowledge of its properties. b) New materials can not be described from empirical method because lack of parameters. Experimentally to expensive or parameter space is too big to be completely examined. c) Ab-initio model (parameter free) is the only option then d) Community that I support is interested in very big (computationally) systems: 1. bulks with dilute level of impurities (supercells) as a new materials, a building blocks, for the active region of novel PV or optoelectronics devices 2. Band alignments (offsets) at the surface interfaces: new materials, transport properties 3. Electronic structure of the QD (nano-clusters) form the first principles: new materials, PV, chemistry, biology e) The (1-3) are very challenging task and NEW algorithms are needed now!

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

Hohenberg-Kohn Theory For any legal wave-function (anti-symmetric and normalised) * ˆ * E[ Ψ ] = Ψ HΨ dr = Ψ Hˆ Ψ The total energy is functional of E[ Ψ ] > E Search all (3N-coordinates) to minimize E in order to find ground state E 0!!! Theorem 1: External potential and the number of electrons in the system are uniquely determined by the density (r), so that the total energy is a unique functional of density: E=E[ (r)] 0 Theorem 2: The density that minimises the total energy is the ground state density and the minimum energy is the ground state energy, min{e[ (r)]}=e 0

Density Functional Theory-LDA 1 2 1 Z α i + + Ψ ( r1,..., rn ) = EΨ( r1,..., rn ) 2 i i, j ri rj i, α ri Rα For the ground state energy and density there is an exact mapping between many body system and fictitious non-interacting system 1 ρ( r ) Z dr V ( r) Eψ ( r) 2 α + + + XC ψ i = i i 2 r r α r Rα The fictitious system is subject to an unknown potential V XC derived from exchange-correlation functional In LDA the energy functional can be approximated as a local function of density: δ E ρ and 2 = [ ( )] XC V [ ] XC ρ r = ρ( r) ψ i ( r) δρ i The DFT Eq. describe non-interacting electrons under the influence of a mean field potential consisting of the classical Coulomb potential and local exchange-correlation potential

Hartree-Fock Theory Specific structure of HF WF: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) * 1 Z 1 ψ r ψ r ψ r ψ r 1 ψ r ψ r ψ r ψ r α EHF = ψ i ( r) + ψ i ( r) dr + dr dr dr dr 2 α r Rα 2 r r 2 r r * * * * i 1 i 1 j 2 j 2 i 1 j 1 j 2 i 2 1 2 1 2 i, j i j i, j i j Coulomb energy Exchange energy After application of the variation theorem and under: ψ ψ = δ i j ij The ground state Hartree-Fock Equation: 1 2 ρ( r ) Z α + dr ψ i ( r) vx ( r, r ) ψ i ( r ) dr Eiψ i ( r) 2 + + = r r α r Rα Exact non-local exchange potential: Ψ 1 det[ ψ ψ... ψ N ] N! HF 1 2 = ψ * j ( r) ψ j ( r ) ψ ( r ) dr i j r r The Hartree-Fock Eq. describe non-interacting electrons under the influence of a mean field potential consisting of the classical Coulomb potential and NON-LOCAL exchange potential

Self Interaction Energy In DFT-LDA each individual electron contributes to the total density An electron in an occupied orbital interacts with N electrons, when it should interact with N-1 this is the self interaction DFT-LDA: all occupied bands are pushed up in energy by this interaction consequence of an on-site/diagonal Coulomb/Exchange repulsion. DFT- LDA band gap are TOO SMALL Hartree-Fock theory: Coulomb and exchange interactions cancels exactly, i.e. J ii = K ii, no self interaction, + absent of correlations: HF band gaps TOO BIG Hybrid functional 20% HF and 80% DFT Energy gaps appear to be good!!!

Functionals: PBE0 & B3LYP Hybrid Functional: An exchange-correlation functional in DFT that incorporates portion of exact exchange from HF (E HF x exact) with exchange and correlation from other sources (LDA, GGA, etc.) PBE0 = Perdew-Burke-Ernzerhof E = E +0.25( E E ) PBE0 GGA HF GGA xc xc x x B3LYP = Becke exchange + (Lee-Yang-Parr + Vosko-Wilk-Nusair) correlation E = E + a ( E E ) + a ( E E ) + a ( E E ) B3LYP LDA HF GGA LDA GGA LDA GGA xc xc 1 x x 2 x x 3 c c E = 0.8 E +0.2E + 0.72( E E ) B3LYP LDA HF B88( GGA) LDA x x x x x E = 0.19 E +0.81E B3LYP VWN 3( LDA) LYP( GGA) c c c HF E x exact = LDA 3 Exc d r f [ ρ( r)] 3 E = d r f [ ρ( r), ρ( r)] GGA xc

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

Implementation: Gaussian Basis Set Anti-symmetric many body wave function Φ = Â N ψ ( r, k) N i i i= 1 One-electron wave-function=lc of Bloch functions = ψ ( r, k) a ( k) χ ( r, k) i i µ, i µ i µ Bloch function=lc of Atomic Orbitals χ ( r, k) = ϕ ( r r R) i µ i µ i µ e kr R Atomic Orbitals = LC of n G Gaussian type functions (orbitals) (GTO) α n µ µ G i d jg α j i µ j= 1 ϕ ( r r R) = (, r r R) α 2 G (, µ ) = exp[ ( µ ) ]( µ ) l ( µ ) m ( µ ) n j ri r j ri r xi x yi y zi z G = G G G r x y z needs to be found coeficients exponents

Linear dependency problem in basis set Typically Gaussian Atomic Orbitals, (r i -r -R), are localized and non-orthogonal!!! Very diffused Gaussians (very small exponent j ) can cause linear dependency problem and trouble SCF cycles!!! Two ways to cure this problem: 1. To limit basis set to less diffused Gaussians: typically up to ~0.1 Bohr -1 2. To project out several smallest eigenvalues of the overlap matrix S(k) that enters Fock equation: F(k)A(k) = S(k)E(k)A(k). In this way possible singularities in Fock matrix F(k) are avoided during SCF cycles

Implementation: Gaussian Basis Set Solution of secular equation in k- space gives: F( k) A( k) = S( k) E( k) A( k) a µ ( ), i k While we have basis functions (Gaussians) in r- space = F( k) F( g) i k e g Fock matrix sum of one- and two- electron contributions F ( g) = H ( g) + B ( g) i, j i, j i, j g One electron contributions: kinetic & nuclear H ( g) = T ( g) + Z ( g) = i Tˆ j + i Zˆ j i, j i, j i, j Two electron contribution: Coulomb and exchange 1 Bi, j ( g) = Ci, j ( g) + X i, j( g) = Pk, l[ i, j k, l i, k j, l ] 2 k, l NON-LOCAL term

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

CRYSTAL scalability on HECToR Test 1: Bulk material (periodic solid) with dilute amount of impurities (1.62%): Ga 432 Sb 426 N 6 No of atoms: 864 Basis sets: m-pvdz-pp relativistic eff. core for Ga, Sb; and m-6-311g for N No of AO: 19837 Symmetry : 1 K-points: 1 X R + 1 x C 128 2048 2048 B: 96 SCF (s) 1024 SHELLX (s) 1024 512 MPP_DIAG (s) 64 32 B: 16-32 MPP_DIAG : wall time MPP_DAIG-MPP_SIML : elapsed 512 512 1024 2048 4096 No of processors 256 512 1024 2048 4096 No of processors 16 512 1024 2048 4096 No of processors Problem identified: system run only when: #PBS -l mppnppn=1 Bad memory management => unnecessary expensive to run

CRYSTAL scalability on HECToR Test 1: New command is implemented in CRYSTAL:. MPP_BLOCK if omitted in INPUT file 16 block is calculated in. CRYSTAL code MPP_DIAG (s) 70 68 66 64 62 60 58 56 54 No. procesors = 3584 Default 52 0 20 40 60 80 100 120 140 MPP_BLOCK Elapse : MPP_BACK + MPP_SIML (s) 125 120 115 110 105 100 No. procesors = 3584 95 0 20 40 60 80 100 120 140 MPP_BLOCK

CRYSTAL scalability on HECToR Test 2: Slab CdSe/ZnS No of atoms: 128 Basis sets: All electron basis sett for all: Cd, Se, Zn, and S No of AO: 4024 Symmetry : 2 K-points: 4 X R + 5 x C 64 16 64 32 NON-DC DIAG routine 32 16 SCF (s) 32 16 8 4 2 64 128 256 512 1024 2048 4096 No of processors SHELLX (s) 8 4 2 1 0.5 64 128 256 512 1024 2048 4096 No of processors 1 MPP_DIAG : wall time MPP_DAIG-MPP_SIML : elapsed 0.5 64 128 256 512 1024 2048 4096 8192 MPP_SIML + MPP_BACK (s) 8 4 2 1 CF 2.5 ideal (CF 2.5) CF 2 CF 3 CF 4 0.5 64 128 256 512 1024 2048 4096 8192 No of processes CMPLXFAC was changed in order to find optimal load balance between complex and real k-points diagonalization MPP_DIAG (s) 16 8 4 2 CF 2.5 ideal (CF 2.5) CF 2 CF 3 CF 4 No of processors

CRYSTAL scalability on HECToR Test 3: Cluster (QD) Cd 159 Se 166 No of atoms: 325 Basis sets: m-pvdz-pp relativistic eff. core for Cd and Se No of AO: 8747 Symmetry : 6 K-points: 1 X R SCF (s) 512 256 128 64 32 16 B16 B32 Default = B96 SHELLX (s) 512 256 128 64 32 16 MONMO3 (s) 32 16 8 4 2 1 0.5 64 128 256 512 1024 2048 4096 No of processors MPI_DIAG (s) 32 16 8 4 2 1 B16 B32 Default = B96 MPP_DIAG : wall time MPP_DAIG-MPP_SIML : elapsed 8 64 128 256 512 1024 2048 4096 No of processors 8 64 128 256 512 1024 2048 4096 N o of processors 0.5 64 128 256 512 1024 2048 4096 No of processors

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

CRYSTAL memory economization on HECToR Test 1: Bulk material (periodic solid) with dilute amount of impurities (1.62%): Ga 432 Sb 426 N 6 No of atoms: 864 No of AO: 19837 Symmetry : 1 Problem identified: system run ONLY when: #PBS -l mppnppn=1 Bad memory management => unnecessary expensive to run!!! Solution: Replicated data structure of the Fock and density matrices is replaced with direct space IRREDUCIBLE representation of the Fock and density matrices for all class of SCF calculations

CRYSTAL memory economization on HECToR st53@nid00015:~/work/bulk_gasbn/896_sym> grep CYC mppnppn_1/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 2 MAX NUMBER OF SCF CYCLES 2 CONVERGENCE ON DELTAP 10**-16 CYC 0 ETOT(AU) -2.147182650300E+05 DETOT -2.15E+05 tst 0.00E+00 PX 0.00E+00 CYC 1 ETOT(AU) -2.145996462515E+05 DETOT 1.19E+02 tst 0.00E+00 PX 0.00E+00 CYC 2 ETOT(AU) -2.146025490714E+05 DETOT -2.90E+00 tst 1.60E-04 PX 0.00E+00 == SCF ENDED - TOO MANY CYCLES E(AU) -2.1460254907141E+05 CYCLES 2 #PBS -l mppwidth=896 #PBS -l mppnppn=1 st53@nid00015:~/work/bulk_gasbn/896_sym> grep CYC mppnppn_2/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 2 MAX NUMBER OF SCF CYCLES 2 CONVERGENCE ON DELTAP 10**-16 CYC 0 ETOT(AU) -2.147182650300E+05 DETOT -2.15E+05 tst 0.00E+00 PX 0.00E+00 CYC 1 ETOT(AU) -2.145996462515E+05 DETOT 1.19E+02 tst 0.00E+00 PX 0.00E+00 CYC 2 ETOT(AU) -2.146025490302E+05 DETOT -2.90E+00 tst 1.60E-04 PX 0.00E+00 == SCF ENDED - TOO MANY CYCLES E(AU) -2.1460254903022E+05 CYCLES 2 st53@nid00015:~/work/bulk_gasbn/896_sym> grep CYC mppnppn_4/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 2 MAX NUMBER OF SCF CYCLES 2 CONVERGENCE ON DELTAP 10**-16 CYC 0 ETOT(AU) -2.147182650300E+05 DETOT -2.15E+05 tst 0.00E+00 PX 0.00E+00 CYC 1 ETOT(AU) -2.145996462515E+05 DETOT 1.19E+02 tst 0.00E+00 PX 0.00E+00 CYC 2 ETOT(AU) -2.146025573510E+05 DETOT -2.91E+00 tst 1.60E-04 PX 0.00E+00 == SCF ENDED - TOO MANY CYCLES E(AU) -2.1460255735103E+05 CYCLES 2 #PBS -l mppwidth=896 #PBS -l mppnppn=2 #PBS -l mppwidth=896 #PBS -l mppnppn=4

CRYSTAL memory economization on HAPU [st53@hapu33 test_si]$ grep CYC std/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 200 MAX NUMBER OF SCF CYCLES 200 CONVERGENCE ON DELTAP 10**-17 CYC 0 ETOT(AU) -4.567509511167E+03 DETOT -4.57E+03 tst 0.00E+00 PX 1.00E+00 CYC 1 ETOT(AU) -4.570469348171E+03 DETOT -2.96E+00 tst 0.00E+00 PX 1.00E+00 == SCF ENDED - CONVERGENCE ON ENERGY E(AU) -4.5705661190498E+03 CYCLES 9 [st53@hapu33 test_si]$ grep CYC sym/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 200 MAX NUMBER OF SCF CYCLES 200 CONVERGENCE ON DELTAP 10**-17 CYC 0 ETOT(AU) -4.567509511167E+03 DETOT -4.57E+03 tst 0.00E+00 PX 0.00E+00 CYC 1 ETOT(AU) -4.570469348171E+03 DETOT -2.96E+00 tst 0.00E+00 PX 0.00E+00 == SCF ENDED - CONVERGENCE ON ENERGY E(AU) -4.5705661190498E+03 CYCLES 9 [ [st53@hapu33 test_si]$ grep CYC sym_nolm/out.dat INFORMATION **** MAXCYCLE **** MAX NUMBER OF SCF CYCLES SET TO 200 MAX NUMBER OF SCF CYCLES 200 CONVERGENCE ON DELTAP 10**-17 CYC 0 ETOT(AU) -4.567509511167E+03 DETOT -4.57E+03 tst 0.00E+00 PX 0.00E+00 CYC 1 ETOT(AU) -4.570469348171E+03 DETOT -2.96E+00 tst 0.00E+00 PX 0.00E+00 == SCF ENDED - CONVERGENCE ON ENERGY E(AU) -4.5705661190498E+03 CYCLES 9 Old code: replicated data in memory MPP New code: irreducible representation MPP LOWMEM New code: irreducible representation MPP NOLOWMEM

CRYSTAL memory economization on HECToR HECToR phase 1 HECToR phase 2 1 core/node = 6 GB/core 2 core/node = 3 GB/core 1 core/node = 8 GB/core 2 core/node = 4 GB/core 4 core/node = 2 GB/core Memory economisation achieved: from more then 3 GB/core required to less the 2 GB/core, i.e., the whole node can be used efficiently.

Outline Motivation CRYSTAL code: Theoretical background CRYSTAL code: Numerical implementation CRYSTAL code: Parallelisation CRYSTAL code: Memory economisation Concussions

Conclusions 1) Memory bottleneck in the CRYSTAL code caused by replicated data structure of the Fock and density matrix is removed by replacing those data structures with its irreducible representations [saving: problem size/(2 x symmetry of the system)]. 2) For big-ish systems the optimal load balance between complex and real k-points is achieved with CMPLXFAC=3, that provide for best scalability of the CRYSTAL code up to 4864 cores on HECToR. 3) Implementation of the new command MPP_BLOCK provides for better control of the block size of distributed data. Reducing block size from default value of 96 to 64 or 32 it can be achieved more than 10% speed-up in diagonalisation part and about 20% speed-up in back and similarity transform for the system size rank=19837 run on 3584 cores. 4) For systems when integral calculations time prevails (MONMO3, SHELLX) over diagonalisation time (MPP_DIAG) better synchronisation of integral calculations is needed: possible global or shared counter implementations investigation. 5) For systems where diagonalisation (MPP_DIAG) dominates over integral calculations (MONMO3, SHELLX) careful examination of blocking is needed