Molecular Science Modelling

Size: px
Start display at page:

Download "Molecular Science Modelling"

Transcription

1 Molecular Science Modelling Lorna Smith Edinburgh Parallel Computing Centre The University of Edinburgh Version 1.0 Available from:

2

3 Table of Contents 1 Introduction Programming Models High Performance Fortran (HPF) Message Passing Interface (MPI) Linda Theory Molecular Dynamics First Principle (Ab-initio) Calculations Hartree-Fock Self Consistent Field Method Density Functional Theory Parallelisation Parallelisation Strategies Parallelisation of Molecular Dynamics Short Range Forces Long Range Forces Summary Parallelisation of First Principle Calculations Hartree-Fock Theory Density Functional Theory Summary Parallel Codes Ab-initio electronic structure methods CETEP AIMPRO CRYSTAL GAMESS_UK GAUSSIAN Other Codes Molecular Dynamics AMBER DL_POLY DPD GBMEGA Other Codes Edinburgh Parallel Computing Centre 1

4 Technology Watch Report 6 Conclusions References Molecular Science Modelling

5 Introduction 1 Introduction The use of the term modelling in science has a specific meaning, a meaning which does not relate to drawing or visualising models on a workstation or P.C. (although modellers can spend part of their time involved in this occupation). The term refers to techniques which involve a set of mathematical equations being used to accurately represent some specific scientific phenomena. Molecular Modelling, that is the modelling of molecules or molecular systems, has diverse applications and is the basis of most computational chemistry techniques. Previously, the application of certain molecular science modelling techniques has been limited by a lack of computational power. Nowadays however this statement has been somewhat negated by the development of parallel computing, in conjunction with that of low cost computational components and fast interconnect technology. Parallel computing offering a cost-effective method of carrying out larger and more realistic simulations. Parallelism in chemistry applications began in the 1980s where applications utilised matrix-vector operations and matrix-matrix operations to take advantage of multiple vector registers on vector supercomputers (Clementi et al., 1984). This trend has been developing ever since, with the use of asynchronous disk operations (e.g. overlapping computations and disk reads and writes) for clusters of workstations. The current challenge is now concerned with developing applications which run efficiently on hundreds (and even thousands) of processors. This report reviews the current status of Molecular Science Modelling techniques which have specific application to parallel computing. Two techniques are reviewed, that of molecular dynamics and ab-inito techniques. Arguably others should be discussed, such as Monte-Carlo methods, however this review focuses on the principle techniques which are currently in use in the UK at the moment. Initially, brief descriptions are given of the programming paradigms most commonly utilised by these applications. This is followed by a brief description of the modelling techniques and a longer description of the parallelisation strategies utilised by these. Finally a review of the programs currently being utilised is given. Edinburgh Parallel Computing Centre 1

6 2 Molecular Science Modelling Technology Watch Report

7 Programming Models 2 Programming Models To write an efficient parallel algorithm, a number of attributes must be considered such as load balancing, scalability and the tolerance of latency and low bandwidth for reference to remote memory locations. The concept of nonuniform memory access (NUMA) is essential when designing parallel algorithms, where memory can be nonuniform not only in the latency and bandwidth of access but also in the way it is accessed. Parallel programming languages deal with NUMA in a variety of ways, either by hiding all complexity by using an automatic parallelising compiler or by using a data driven model where the distribution of data is made explicit but all the data is referenced with the same language constructs (e.g. HPF). A subroutine interface can be used to access remote or distributed data (e.g. Linda) or in some cases no direct access is allowed to remote data (e.g. MPI). There are a large number of parallel programming environments which will not all be discussed here. In this section some of the more common programming environments utilised in writing parallel programs in the area of scientific modelling are briefly reviewed. 2.1 High Performance Fortran (HPF) This parallel fortran language is a set of constructs and extensions to Fortran90 which allow the user to express parallelism in a relatively simple manner. It was designed to promote the wider use of parallelism by hiding the details of the architecture from the programmer and to provide a portable language. HPF is primarily a data parallel language. The program resembles an ordinary sequential programming language, the flow of control following strict sequential order except when parallel intrinsics or built in procedures are called. The programmer only views a single memory, the actual distribution of data and communication between processors is done by the compiler with guidance from the programmer. The programmer places specially designed comments, called compiler directives, within his code to aid the compiler in distributing the data and work on a parallel machine. Example directives include DISTRIBUTE and ALIGN which are used to partition data among memory regions and the FORALL construct defines the assignment of multiple elements in an array without enforcing any order on the assignment to individual elements. HPF will work well where regular data structures are present (the case in some molecular dynamics algorithms) however if either the load balancing or the data structures are not regular (this is the case for a lot of quantum chemistry) then HPF becomes less effective. Further information on High Performance Fortran can be obtained from the Technology Watch Report High Performance Fortran. History, Overview and Current Status (Richardson, 1995). Edinburgh Parallel Computing Centre 3

8 Technology Watch Report 2.2 Message Passing Interface (MPI) MPI is based on the message passing model where each process has a local memory and no other process can directly read or write from or to that local memory. Unlike the data parallel model discussed above there is no globally addressable memory. This model is effective for algorithms that utilise static domain decomposition and for systolic loop type algorithms (which are both used in some molecular dynamics codes) and for course grain algorithms (utilised in SCF calculations). Domain decomposition and systolic loop algorithms are described in more detail in later sections. The program is written in a standard language (e.g. fortran or C) with data movement being controlled by calls to communication routines from some communication library. MPI has been designed as a standard for message passing and the development involved a large number of users and software and hardware vendors. Although MPI represents a standard for message passing, other message passing packages exists which are utilised by the scientific modelling community. Three of the most common of these are PVM (Parallel Virtual Machine), TCGMSG (Theoretical Chemistry Group Message Passing System) and occam (which was inspired by the CSP (communicating sequential processes) model). Further information on MPI can be obtained from the Technology Watch Report MPI: A Message-Passing Interface Standard. History, Overview and Current Status (Malard, 1996). 2.3 Linda Linda is a coordination language which is built on a base language, such as C or Fortran. Linda is based on a distributed data structure model and creates a virtual shared memory for every Linda program called tuple space. This is simply a medium used to share data between different processes without the need of a physically shared memory and can be accessed by any process within a given Linda program. Data is moved to and from tuple space in the form of tuples, tuples are the data structures of tuple space and are simply a collection of typed data objects or place holders (called elements). Linda is limited in it ability to provide information on how tuples are stored or accessed and also in the need for general tuples to be matching which can lead to inefficiencies in memory usage and communication. The lack of primitives for efficient global communication can also be a problem. 4 Molecular Science Modelling

9 Theory 3 Theory In this section a brief description of the two techniques, molecular dynamics and abinitio techniques, are given followed by a more detailed examination of the parallelisation strategies. Reviews of these (with application to parallel computing) include that of Harrison and Shepard (1994), of Kendall et al. (1995) and of Smith (1993a). 3.1 Molecular Dynamics Molecular Dynamics is a method for solving the many-particle equation of motion for a molecular system (Allen et al., 1997). It is used to determine equilibrium and transport properties of a classic many body system and involves the iterative computation of the total potential energy, forces and coordinates of every atom in the system at each of a number of time steps. A molecular dynamics simulation involves the following steps: 1) An initial starting configuration is constructed. This requires a set of initial position for the atoms, which can for example be taken from a known crystal structure or can simply be a set of random numbers. Initial velocities for the atoms are also required which can again be random numbers. 2) The system is then initialised by scaling the velocities to the desired temperature. 3) The forces on each particle are calculated. 4) The movement of the particles within some time interval are calculated (from the atomic positions, velocities and forces) 5) The atomic positions are updated and the process repeated. This cycle has to be repeated numerous times within a molecular dynamics simulation. Molecular Dynamics describes the molecular system as a function of time. This is an advantage over Monte Carlo methods in that time dependant phenomena such as transport properties (e.g. viscosity) can be calculated directly. The most computationally expensive part of a molecular dynamics simulation is the force calculation, i.e. the calculation of the interactions between particles. The integration of the equations of motion are also computationally intensive, however the force calculation is of the order N 2 (where N is the number of particles) and the equation of motion integration is of the order N. The calculation of the pair wise interactions can be carried out independently. This makes this part of the calculation inherently parallel, a fact that has been exploited by parallel molecular dynamics codes. 3.2 First Principle (Ab-initio) Calculations These calculations involve the direct calculation of material properties from fundamental quantum mechanical theory (Gillan, 1994) and essentially involves solving Edinburgh Parallel Computing Centre 5

10 Technology Watch Report various approximations to the Schrodinger equation that describes their basic structures. Fundamental properties, for example bond strength and reaction energies have been calculated from first principles. The system being studied (an atom or a molecule) is described by wavefunctions (generally this is a complex mathematical expression). The wavefunction can then be used to determine various properties and the energy of the system. The problem is that for the systems being studied, the wavefunctions can not be determined analytically. Approximations need to be made, the most common being the Born-Oppenheimer approximation. This allows the nuclei to be regarded as stationary, whilst the electrons move (due to the much heavier weight of the nuclei in comparison to the lighter electrons). The motion of the electrons is correlated, i.e. the motion of one electron is affected by the other electrons. However another approximation that is often made is that the motion of the particles is not correlated. The particles still interact, however each particle experiences and interaction which occurs from a smeared representation of the average position of all the other particles, rather than an instantaneous interaction which changes as they move. The problem now consists of finding a set of individual wavefunctions, one for each particle. These individual wavefunctions are known as molecular orbitals Hartree-Fock Self Consistent Field Method The simplest method of determining the electronic structure of molecules by forming approximate solutions to the Schrodinger equation (within the Born-Oppenheimer approximation) is the Hartree-Fock or self-consistent field method (SCF). This method takes the molecular wavefunction (a non-separable function of the coordinates of all N electrons) and approximates this to as an antisymmetric product of N one electron functions. Each of these one electron functions (called molecular orbitals) is expanded in an underlying basis set (typically atom centred Gaussian like functions) and the molecular orbitals are then determined by minimising the total energy by varying the expansion coefficients (C). The simplest N-electron wavefunction in use is a single antisymmetric product ( Slater determinant ) of one electron functions which are orthonormal linear combinations of the atomic orbital basis functions. The most computationally expensive part of the calculation is computing the derivatives of the energy with respect to the molecular orbital coefficients. This is closely related to the Fock matrix (F): Fij = hij + ( 1 2) [ 2 ( ij kl) ( ik jl) ] Dkl where D kl is the density matrix, h ij and (ij kl) are one and two electron integrals over the underlying basis function. The basis is typically of dimension N=O( ) and thus even allowing for sparsity the two electron integrals (which have four labels) are numerous. The major computation is evaluation of the non-zero integral and the largest data requirements are due to the Fock and density matrices (both O(N 2 )). The problems are: 1) Distributing and accessing the matrices to minimise communication costs. kl 2) Maintaining load balance in the presence of sparsity and large variation in the cost of integral evaluation. This involves contraction of a large, sparce four-index matrix (electron-repulsion integrals) with a two index matrix (the electronic density) to yield another two index 6 Molecular Science Modelling

11 Theory matrix (the Fock matrix). Both matrices are of size N*N where N is the size of the underlying basis set. (N= typically). The number of integrals scales between O(N*N) and O(N*N*N*N) depending on the nature of the system and level of accuracy required. The methodology is as follows: 1) A position is chosen for the atomic nuceli 2) A certain set of Gaussian basis functions is chosen. 3) An initial guess of the form of the 1 electron wave functions is generated by choosing the coefficients of the Gaussian basis functions representing each molecular orbital. 4) The density matrix is computed 5) The Fock Matrix is constructed 6) The N equations are solved 7) If by solving the N equations, improved molecular orbitals are obtained then these are used in the first step of the new iteration. Else the process is terminated and continues to stage 8. 8) The total energy of the system can now be evaluated Density Functional Theory Hartree-Fock theory uses an exact Hamiltonian and approximate many -electron wavefunctions. The correlations between electrons can be either long-range or short-range. Self consistent field theory deals with long-range forces by using averaging techniques, i.e. the field experienced by an atom depends on the global distribution of the atoms. Short range correlations, which involve the local environment around the atoms i.e. deviations, are not treated using the self consistent field method. These short range forces are often minor however in some cases, such as high temperature ceramic superconductors, these correlations are strong and need to be considered. Kohn and Sham (1965) have developed a theory to deal with this problem, this has been termed density functional theory since the electron density plays a crucial role. Effectively the energy is written as a function of the electron density rather than in terms of the many-electron wavefunctions. Approximations are made to the Hamiltonian. The difference in the two techniques can be seen by considering the forms of the energy for Hartree-Fock and for density functional theory. Hartree-Fock: where V is the nucleur repulsion energy, P is the density matrix, hp is the one electron (kinetic + potential) energy, 1/2P j (P) is the classical coulomb repulsion of the electrons and 1/2P k (P) is the exchange energy resulting from the quantum nature of the electrons. Density functional theory: E HF 1 1 = V + hp P j ( P) 2P k ( P) Edinburgh Parallel Computing Centre 7

12 Technology Watch Report E KS 1 = V + P E 2P j ( P) x [ P] + E c [ P] where E x [P] is the exchange functional and E C [P] is the correlation potential. The exchange functional and correlation functionals are integrals of some function of the density and in some cases the density gradient. A similar methodology is used as to that of Hartree-Fock methods. The Hamiltonian is broken down into some basic one electron and two electron components as before, however the two electron components are further reduced to a combination of the Coulomb term and the exchanges correlation term(s). This extra term is incorporated into the Fock matrix. The correlation term is normally integrated numerically on a grid, or fitted to a Gaussian basis and then integrated analytically. The computationally intensive parts of the calculation involve the fitting of the density, the construction of the Coulomb potential, the construction of the exchange-correlation potential and the subsequent diagonalisation of the resulting equations. The exchange correlation potential in density functional theory is determined only by the electron density, The precise dependence on density is not known except for the homogenous electron gas. For other situations the electron density varies through space and the assumption is made that the exchange correlation at a certain point is given by the homogeneous electron gas value involving the density at the same point. The charge density is determined and compared to the charge density used to generate the effective potential previously. If this is an improvement on the original charge density the cycle continues. An initial guess is made to the electronic charge density, the Hartree potential and exchange correlation potentials are then calculated. The hamiltonian matrices for each of the k points included in the calculation are constructed and diagonalised to obtain the Kohn-Sham eigenstates. These eigenstates can then be used to generate the charge density, a new set of Hamiltonian matrices is then generated and the process repeated until the output charge density is self consistent with the charge density used to construct the electronic potentials. In general, the Kohn-Sham equations are used rather than the Hartree-Fock equations, the methodology being very similar. 8 Molecular Science Modelling

13 Parallelisation 4 Parallelisation Most of the original work on parallel processing of chemistry applications was done by the LCAP (Loosely Coupled Array Processors) project (Clementi, 1990) whose aim was: to couple readily available commercial processors to form a system that is not massively parallel, but rather is modular and can be expanded to match the degree of parallelism that a set of applications can support For example, a direct SCF calculation was carried out using a master/slave model whereby each processor calculated a subset of the electron integrals and passed a partial Fock matrix back to the master processor which added these together. This project was probably the pioneering work in parallel computational chemistry and provided the incentive for later developments, many of which use the replicated data technique which was developed in this project. In the area of quantum chemistry, growth in parallel quantum chemistry codes has been relatively slow since this project, and considerably slower than the large growth in users of quantum chemistry codes. This is partly due to the size of the standard codes such as SPARTAN (Carpenter et al.) and Gaussian (Frisch et al.) and also due to a lack of code developers. This has been somewhat remedied, with parallel versions of the standard packages Gaussian, GAMESS (Guest et al., 1987), HONDO (Dupuis et al., 1993) and TURBOMOLE (Ahlrichs et al., 1989) now in existence. The emphasis is still however primarily on users own codes. Parallelisation of molecular dynamics calculations has fared rather better historically with the inherently parallel nature of molecular dynamics described extensively in the literature (a number of these reviews are referenced in the text). A number of widely used simulation packages have been parallelised such as CHARMM (Brooks et al., 1992), AMBER (Weiner et al., 1984), Discover (Biosym Technologies) and GROMOS (Bioms B. V.). In this section the principal parallelisation techniques utilised in molecular dynamics and ab-initio codes will be discussed. 4.1 Parallelisation Strategies There are five basic strategies for parallelisation. The first two of these will be discussed briefly whilst the later three, which have been applied more extensively to molecular dynamics simulations and first principle calculations, will be discussed in more detail in later relevant sections. 1) Cloning 2) Master-Slave 3) Replicated Data 4) Systolic Loops Edinburgh Parallel Computing Centre 9

14 Technology Watch Report 5) Domain Decomposition Hybrids of these are also available. Cloning simply involves allocating P independent simulations to P processors. This technique is both easily implemented and very efficient but is however limited in application. This is particularly suited to Monte Carlo simulations, where each processor conducts and independent random walk. The Master-Slave model utilises a master processor to run, or control the simulation. This processor allocates work to other processors when necessary. The problem with this model is both communication difficulties and load balancing problems. 4.2 Parallelisation of Molecular Dynamics When considering the parallelisation of molecular dynamics there are two important considerations. Firstly, the algorithm must be effective for a relatively small number of atoms (e.g. less than 1000) as the aim of any simulation must be to model the system accurately with the smallest number of atoms (and thus performing each time step as readily as possible). Most molecular dynamics simulations are carried out on systems of a size ranging from a few hundred to a few thousand atoms. Secondly, truly scalable algorithms are important and should hopefully be able to exploit larger and faster parallel machines developed in the future. The parallelisation of molecular dynamics calculations are discussed in this section. The force terms involved in a simulation are typically non-linear functions of the distance between pairs of atoms and can be either long-range or short-range. Both these types of forces will be discussed Short Range Forces The three most common methods of parallelising short range molecular dynamic simulations were suggested by Plimpton (1995) who developed: 1) Atom Decomposition. This is based on the replicated data method. 2) Force Decomposition. This involves either a systolic loop method or a force matrix method. 3) Domain Decomposition. A common domain decomposition method is the Linked Cell method. An extension of this method is called spatial decomposition (also known as geometric methods). 1. Atom Decomposition This method, which is based on the replicated data strategy, has identical copies of the configuration data on all the processors. Atom decomposition involves a subgroup of atoms being assigned to each processor, the processor computes forces on its atoms no matter where they move in the simulation domain, hence the name atom decomposition. Firstly each processor has a complete copy of the coordinates and velocities of the atoms in the system. Each processor is assigned a sub-block of the N*N force matrix (where N is the number of atoms) to calculate. For example, if there are P processors and ~ N(N-1)/2 interactions (note the 1/2 factor is a result of Newton s third law, F ij = F ji. then each processor calculates N(N-1)/2P of these interactions. 10 Molecular Science Modelling

15 Parallelisation At this point no processor has a complete representation of the force matrices and hence cannot build the total particle forces. The incomplete force arrays must be circulated to all the other processors to complete the summation of the forces on each processor. This requires a global pass - and - sum (Smith, 1991). This strategy involves each processor exchanging its data (N/P of the data) with an adjacent processor and the arrays are then summed. Following this, each processor now exchanges its data (now 2N/P) with a processor two positions away (+2 or -2 away) and the arrays are then summed. The procedure is then repeated again with a processor four positions away (+4 or -4 away) and so on. As every processor concurrently follows this sequence the end result is an identical sum on all processors, of all the original arrays local to each processor. Fig 1 shows the scheme Figure 1: The global pass - and - sum scheme The idea was outlined by Fox (Fox et al., 1988). Finally the equations of motion are integrated independently on each processor, again without reference to any other processor. The scheme benefits from simplicity. Routines exist for the global pass - and - sum and these can be inserted in proper locations in the code. Few other changes are typically required to parallelise the code. The duplication of information on each processor allows for straightforward computation of three and four body force terms. Data replication on each processor implies that the strategy is expensive in terms of memory. The efficiency of the algorithm is limited by the global summation of the forces, where the communication scales as N, independent of P. This is demonstrated by Plimpton (1995) in his benchmark of a Lennard Jones code where communication costs started to dominate with increased numbers of processors. Edinburgh Parallel Computing Centre 11

16 Technology Watch Report Examples of the use of this algorithm include the parallel implementations of GRO- MOS (Skeel, 1991) and CHARMM (Brooks et al., 1983). Bruge (Bruge et al., 1988) also developed a molecular dynamics program for ST2 water molecules using this technique. 2. Force Decomposition There are two types of force decomposition, the systolic loop algorithm and the forcematrix formulism described by Plimpton (1995). There are a number of different types of systolic algorithm, Raine (Raine et al., 1989) described three separate types of algorithm: 1) Systolic Loop Single Group (SLS-G) 2) Systolic Loop Double Group (SLD-G) 3) Systolic Loop Bidirectional Group (SLB-G) The general concept involves packets of data being circulated between processors, with the packets containing data relating to a subset of atoms (e.g. the atom coordinates, velocities and force accumulators). All these algorithms are fully distributed, each processors processes only a subset of the total system data and hence the memory demands are less than that of the replicated data (atom decomposition) strategy. In the SLD-G algorithm the data is shared between processors so that each processor has a group of atoms (with the force accumulators set to zero). This algorithm requires that the processors are connected in a ring topology with an odd number of processors. Each processor duplicates its packet which contains the coordinate arrays and force accumulators. One packet will remain on the home processor (i.e. fixed) and the other will be passed between processors. The pair forces within a home group are calculated and added to the home force accumulators. The duplicated packet is then passed to the next processor in a specific direction in the ring, hence each processor now has the atomic coordinates and force accumulators for two packets and the forces between these groups can be calculated and added to the force accumulators. These packets are then passed again in the same direction to the next processor and the process completed. With P processors the packet must be passed (P-1)/2 times so all possible pair forces have been calculated (this is the reason there must be an odd number of processors). The duplicated packets must then be passed back (in the opposite direction to the way they were sent) to their original home processor and the force accumulators of the replicated packet and the home processor packet added. Figure 2 shows the passing scheme for five processors. 12 Molecular Science Modelling

17 Parallelisation Rewind: Figure 2: The scheme utilised by the SLD-G algorithm to pass packets of data. This algorithm benefits from good load balancing and as mentioned is less memory expensive than atom decomposition (replicated data). The algorithm however loses from the need to send the replicated data packets back to their home nodes. This rewind step is wasteful. The SLB-G algorithm was thus developed to try and decrease this wastefulness. As with the SLD-G algorithm, the SLB-G algorithm involves the data being shared between the processors in a ring topology with an odd number of processors. Duplicated data packets are again produced. In this case however each of the packets on a processor are sent in opposite directions i.e. there is no home packet. At the end of (P-1)/2 data passes the duplicate data packets are within one pass of each other and hence the rewind step is much shorter than for the SLD-G algorithm. This technique is shown schematically in figure 3. The velocities of the atoms must be included in the Edinburgh Parallel Computing Centre 13

18 Technology Watch Report data packets as there is no home processor to return to and are needed when the force calculations are complete Figure 3: The scheme utilised by the SLB-G algorithm to pass packets of data The last algorithm, the SLS-G algorithm, initially has two data packets assigned to each processor. However, unlike the previous two algorithms each of the two packets on a node represent different groups of atoms. The number of processors can be odd or even and the processors are connected in a line with a head and tail processor at either end. Initially the forces within packets are calculated and then the forces between different packets on processor calculated. The data packets are then exchanged. Each processor (which is not a head or tail processor) sends the first data package to the right and simultaneously receives one from the left. The processor then sends the second package to the left and receives one from the right. The tail processor (the processor at the far right of the chain) sends the first data package to the left (package A), this is then replaced by the second data package on the same processor (package B). The tail processor also receives one package from the left. On the next pulse package B is sent to the left. The head processor (the far left processor) has one data package permanently fixed. The other package is sent to the right and a packet is received from the 14 Molecular Science Modelling

19 Parallelisation right. See figure 4. The number of sends required is 2P-1 to return the packets to their home processors with completed force accumulators Figure 4: The scheme utilised by the SLS-G algorithm to pass packets of data This algorithm has the advantage of being more generally applicable than the previous two examples. Some attempts to improve the efficiency of these algorithms has been carried out, mainly focusing on overlapping the communications with the computations of the forces. Systolic loop algorithms have been used successfully by a number of authors, for example Heller et al. (1990) built a sixty node MIMD parallel computer with a systolic loop architecture and programmed it in occam 2. They were interested in carrying out molecular dynamic simulations of large biopolymers. Fock-matrix algorithms differ from that of atom decomposition in that the algorithm is based on block-decomposition of the force matrix rather than row-wise decomposition. This method is advantageous in that the memory and communication costs are reduced by a factor of sqrt(p) versus the atom decomposition methods. Plimpton s Lennard Jones benchmark problem (Plimpton, 1995) continued to speed up, even when hundreds of processors were used. 3. Domain Decomposition The Linked Cell method is a commonly utilised domain decomposition method. The sequential version of the linked cell algorithm involves the molecular dynamic simulation cell being divided into smaller identically sized subcells. Their width must be slightly greater than the minimum cut-off radius, but apart from this the number of cells is chosen to be a maximum. Each atom is assigned to an appropriate sub-cell and a linked-list is created, a means in which each atom may be located. A header list is also constructed which identifies the first member of each subcell. For one subcell, Edinburgh Parallel Computing Centre 15

20 Technology Watch Report the interactions between each atom of the subcells and its neighbours in the subcell or in one of the neighbouring subcells is calculated. Half of the neighbouring cells are excluded to avoid double counting of pair-wise interactions. This is carried out on each subcell leading to the force evaluation. To parallelise this scheme, the molecular dynamics simulation cell is divided into regions in a manner similar to the method used in the serial version. Each region has the same shape and size (to ensure good load balancing) although it should be noted that this need not be cubic. Each region is then assigned to a specific processor, the region must be several times larger than the pair-forces cut-off. The region on each processor is then further subdivided into sub-cells (like the sequential algorithm), remembering to consider the cut-off range. The mapping of the regions onto the parallel processor is important and should ensure that neighbouring processors on the network should handle neighbouring regions of the molecular dynamics cell. Every subcell within a region has enough neighbouring subcells to complete the calculation. The exception to this is the subcells which lie on the regions boundaries. To calculate the forces on the atoms in subcells at the boundaries, the neighbouring processors must exchange copies of the relevant boundary subcell data. Two possible strategies exist to do this. The first of these involves exchanging copies of the atoms occupying the relevant boundary regions, then using these copies to only calculate the pair forces for the resident boundary regions i.e. no force data is communicated, only coordinate data, which is passed in both directions (Pinches et al., 1991). The second method involves passing the boundary region coordinates in one direction only (e.g., north, east and up boundary regions). These are then used to calculate the atom forces (on atoms in the south, west and down boundary regions of neighbouring cells) and the calculated forces are communicated back to the nodes in the south, west and down directions where they are added to the forces in the north, east and up regions of these nodes. The first method benefits from overlapped two way communication but involves some force calculation duplication. The second method involves no force duplication but has no overlapped communications. Smith (1991) suggested that there was unlikely to be any substantial difference between the two methods. After boundary data has been communicated the pair forces are calculated. The process then continues as in the sequential algorithm. The integration of the equations of motion has the advantage that each processor integrates the equations of motion for its atoms only. It is important to keep track of atoms which move out of a region allocated to a processor to the region of another processor. After the equations of motion have been integrated the location of the atoms must be checked and the atom coordinates and velocities must be reallocated to the appropriate neighbour processor if necessary. This algorithm is relatively easy to parallelise and is appropriate for simulations of very large systems. Pinches et al. (Pinches et al., 1991) used the link cell algorithm successfully for systems of atoms in two and three dimensions. Spatial decomposition (also called geometric methods) is very similar in nature to the link cell method. As with the link cell method, the simulation box is divided into smaller three dimensional boxes. However only one box is assigned to each processor. The size and shape of the boxes depends on the total number of atoms and the number of processors and a cubic box is favoured to minimise communication costs. The method differs in that the box lengths may be smaller than or larger than the force cut-off length. Each processor maintains two data structures, one for the atoms within its own box (N/P atoms) and one for atoms in neighbouring boxes. In the first structure each processor stores a complete set of information i.e. coordinates, velocities etc. The data is 16 Molecular Science Modelling

21 Parallelisation stored in a link list to keep track of the atoms moving around different cells. The second data structure only contains atom positions. In order to calculate the pairwise forces on each processor the second data structures need to be communicated between relevant processors. The scheme for communicating this data can be described in a number of steps: 1) Each processor firstly exchanges the second data packet in an east and west direction with neighbouring processors. For example, in figure 6 processor 1 fills a buffer with atom positions that are within a cut-off length of processor 0 s box. When the length of the box in the east/ west direction (d) is less than the cut-off length then this will be all of processor 1 s atoms, else it will contain those nearest to box 0. The message buffer is then sent to processor 0 from processor 1 (i.e. westwardly). All processors do this and hence processor 1 also receives a message from processor 2 (received from an easterly direction). The process is then repeated in the opposite direction (processor 1 sends to processor 2 and receives from processor 0). If the length of the box is greater than the cut-off length then all the necessary atoms have been received. If however the length of the box is less than the cut-off length, further communication is required and the east-west procedure is repeated. For example, processor 1 sends to processor 2 the atom positions from processor 0 (which processor 1 now has). This process can be repeated until each processor has all the atom positions within the cut-off range of its box. 2) The procedure is repeated in the north/south direction. In this situation however the data packet being sent to an adjacent processor contains not only atom positions that the processor owns but also those atom positions in the second data structure needed by that processor. See figure 6. (e.g. when the box length equals the cut-off limit three boxes are sent). 3) The procedure is repeated in the up and down direction. When the box lengths equals the cut-off limit and entire plane of boxes is sent. See figure 6. a) east/west exchanges b) north/south exchanges 2 c) up/down exchanges Figure 5: Schematic representation of data passing for spatial decomposition (after Plimpton, 1995). Edinburgh Parallel Computing Centre 17

22 Technology Watch Report An important feature is that when the box length is less than the cut-off distance and more atom information is needed from more distant boxes, this only requires a few extra data exchanges, all of which occur with the six intermediate neighbour processes. This allows the algorithm to perform efficiently, even with a large number of processors used for a small problem. One example of the use of spatial decomposition techniques is the large-scale molecular dynamics code developed by Belak (1993) for the BBN-TC Long Range Forces Typically long range forces encountered in molecular dynamics simulations include coulombic interactions in ionic solids or biological systems and normally involve each atom interacting with all the other atoms. Direct computation of these forces scales as N 2 and becomes more and more computationally prohibitive with large values of N (the number of particles). One method used is the Ewald summation. This method (Ewald, 1921) involves calculating three different terms, the sum of which results in the coulombic energy of the system. These terms are: 1) A sum in reciprocal space. This term has a cubic dependence on the chosen range of the reciprocal space and a linear dependence on the number of ions in the replicating cell. 2) A sum in real space. This is quadratically dependant on the number of ions. 3) A constant which only requires calculation once in the simulation. Smith et al. (1993b) discussed the parallelisation of the reciprocal and real space sums using a replicated data strategy. He described two different methods for determining the reciprocal sum, the first involves assigning a specific subset of the ions to each processor to compute (reduced ion list method) and the second allocates each processor a unique set of the reciprocal vectors (k) in the sum (reduced k vector list method). The former method needs to communicate between the processors during the force calculation while the latter does not. Kalia et al. (1993) described atom and spatial decomposition methods for the treatment of the Ewald sum. One other method for calculating the long range forces was described by Ding et al. (1992) called the cell multipole method, a technique well suited to parallel systems Summary Atom decomposition, or replicated data techniques are effective due to their relative ease of implementation. The need for global communications however results in poor scaling, communication costs dominating when large numbers of processors are used. The method is also expensive in memory due to data replication on each processor. This technique is used extensively and successfully in a number of codes, as later sections show. Spatial decomposition is ideally suited to large molecular dynamic simulations and scales well. Although this is more complex to implement than other techniques this method is likely to give the best performance increase. 18 Molecular Science Modelling

23 Parallelisation 4.3 Parallelisation of First Principle Calculations The parallelisation of ab-inito calculations uses some of the techniques described for molecular dynamics. Replicated data strategies have been utilised extensively and systollic loop strategies have also been described. Distributed data techniques have been successfully exploited to successfully parallelise ab-initio codes. In this section a number of the parallelisation techniques utilised to parallelise abinito codes are described Hartree-Fock Theory The most computationally intensive piece of the construction of the Fock matrix is the calculation of the two electron integrals. The density matrix, D and the Fock matrix, F, are symmetric and for any (i,j,k,l) the following integrals are equivalent: (ij kl) = (ji kl) = (ij lk) = (ji lk) = (kl ij) = (kl ji) = (lk ij) = (lk ji) Hence, once (ij kl) has been computed then the elements F ij, F ik, F il, F jk, F jl and F kl can be updated with the product of this integral and the appropriate element of the density matrix. Thus, rather than having to compute N 4 integrals, only ~N 4 /8 integrals need to be calculated. Screening is often considered to reduce the number of integrals requiring calculation. Simply this means that integrals whose size is so small that they are negligible are eliminated. This can reduce the number of integrals from O(N 4 ) to O(N 2 ) in some cases. The fact that each integral may be computed separately means that integral evaluation can be parallelised. The replicated data method is the same principle as that described for molecular dynamics. The density and Fock matrices are replicated onto each processor, each processor computes a subset of the integrals to form a partial Fock matrix. The partial Fock matrices are then globally summed to form the total Fock matrix. This method benefits from the fact that the integrals and required density and Fock matrix elements are all local to each other. There is no communication between processors except for the global summation of the Fock matrix and the broadcast of the density. The implementation is also relatively simple. The main drawback with this technique is that the size of the problem is limited by the memory of each individual processor rather than the complete (aggregate) memory of the machine. There are several examples of the use of the replicated data scheme for SCF methods, however one of the most well known is by Cooper et al. (1991) who parallelised GAMESS-UK (Guest et al., 1992) on a transputer based system. An alternative method to replicated data schemes work is that of distributed data algorithms. Burkhart et al. (Burkhart et al., 1993) distributed the integral evaluation between processors and in the process accumulated the Fock matrix on one fast processor with a large amount of memory. One problem with this algorithm was the serial accumulation of the Fock matrix on the one processor which limited the speed up to the ratio of: time taken to compute integrals time required to send them to the master processor and add them to the Fock matrix. Edinburgh Parallel Computing Centre 19

24 Technology Watch Report The model Burkhandt et al. considered was a farm model, one specific processor (the master processor) generates the jobs and distributes them to the other (server) processors. In this situation the number of data sets must significantly exceed the number of processors. To optimise the efficiency, the authors considered two different communication options and two different methods of updating the Fock matrix. Communication options: 1) The master processor distributes the jobs to all the server processors and then receives the results from all the server processors. The master processor then schedule new jobs to the now idle server processes (called global communication management). 2) Each server processor decides whether to process a given job or to send it to another server (local communication management). Fock matrix generation: 1) All the calculated integrals are returned to the master processor where the Fock matrix is calculated using the integrals and the density matrix (sequential Fock matrix update). 2) Each server receives the density matrix and builds its own partial Fock matrix (distributed Fock matrix update). The problem with using a sequential Fock matrix update was the Fock matrix determination created a bottleneck for the communications required. They achieved better success with this technique when local communication management was utilised rather than global communication management however they concluded that for more than sixteen processors using a distributed Fock matrix was most effective. Another progression came with Colvin et al (1993) who utilised a systolic loop system similar to those mentioned previously for molecular dynamics. Colvin s method involved setting up a systolic loop and data packets being passed around the ring. Each processor hosts a sub-block of the Fock matrix and the density matrix. A second copy of both the Fock and density matrices is formed and these are passed around the ring. After each send, the processor forms all the interactions that connect the current density and Fock matrix elements and then passes the data to the left and receives the new data from the right where the process is repeated. If there are P processors, P sends are required in order to send all the density and Fock matrix blocks around the ring. After this the full two electron Fock matrix is formed, the one electron terms are added and the Fock matrix is ready for transformation and diagonalisation. The number of integrals that need to be calculated is 3N 4 /8 and the problems with load balancing have forces the need for asynchronous communications and double buffering. As mentioned before, the number of integrals requiring computation can be reduced to N 4 /8 by considering equivalency. In general the calculation of each element of the Fock matrix requires access to all the elements of the density matrix and the array which holds the elements i,j,k,l (the Z matrix). Each integral requires six elements of the density matrix and contributes to six of the Fock matrix elements. i.e. there are N 4 /8 integral computations. Each data element is accessed by a number of integral computations, hence the computations need to be able to access the data in an asynchronous and distributed fashion. Each integral computation must perform sixteen communications, six to obtain the density matrix elements, four to obtain the Z matrix elements and six to store Fock matrix elements. Rather than using replicated data techniques we can use partial replication techniques which have been described in detail by Foster et al. (1996). 20 Molecular Science Modelling

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Density Functional Theory

Density Functional Theory Density Functional Theory Iain Bethune EPCC ibethune@epcc.ed.ac.uk Overview Background Classical Atomistic Simulation Essential Quantum Mechanics DFT: Approximations and Theory DFT: Implementation using

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

MO Calculation for a Diatomic Molecule. /4 0 ) i=1 j>i (1/r ij )

MO Calculation for a Diatomic Molecule. /4 0 ) i=1 j>i (1/r ij ) MO Calculation for a Diatomic Molecule Introduction The properties of any molecular system can in principle be found by looking at the solutions to the corresponding time independent Schrodinger equation

More information

Session 1. Introduction to Computational Chemistry. Computational (chemistry education) and/or (Computational chemistry) education

Session 1. Introduction to Computational Chemistry. Computational (chemistry education) and/or (Computational chemistry) education Session 1 Introduction to Computational Chemistry 1 Introduction to Computational Chemistry Computational (chemistry education) and/or (Computational chemistry) education First one: Use computational tools

More information

Density Functional Theory

Density Functional Theory Density Functional Theory March 26, 2009 ? DENSITY FUNCTIONAL THEORY is a method to successfully describe the behavior of atomic and molecular systems and is used for instance for: structural prediction

More information

CHEM3023: Spins, Atoms and Molecules

CHEM3023: Spins, Atoms and Molecules CHEM3023: Spins, Atoms and Molecules Lecture 5 The Hartree-Fock method C.-K. Skylaris Learning outcomes Be able to use the variational principle in quantum calculations Be able to construct Fock operators

More information

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland 1) Question. Two methods which are widely used for the optimization of molecular geometies are the Steepest descents and Newton-Raphson

More information

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

What is Classical Molecular Dynamics?

What is Classical Molecular Dynamics? What is Classical Molecular Dynamics? Simulation of explicit particles (atoms, ions,... ) Particles interact via relatively simple analytical potential functions Newton s equations of motion are integrated

More information

TIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION

TIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION Mathematical and Computational Applications, Vol. 11, No. 1, pp. 41-49, 2006. Association for Scientific Research TIME DEPENDENCE OF SHELL MODEL CALCULATIONS Süleyman Demirel University, Isparta, Turkey,

More information

Density Functional Theory. Martin Lüders Daresbury Laboratory

Density Functional Theory. Martin Lüders Daresbury Laboratory Density Functional Theory Martin Lüders Daresbury Laboratory Ab initio Calculations Hamiltonian: (without external fields, non-relativistic) impossible to solve exactly!! Electrons Nuclei Electron-Nuclei

More information

Non-Born-Oppenheimer Effects Between Electrons and Protons

Non-Born-Oppenheimer Effects Between Electrons and Protons Non-Born-Oppenheimer Effects Between Electrons and Protons Kurt Brorsen Department of Chemistry University of Illinois at Urbana-Champaign PI: Sharon Hammes-Schiffer Funding: NSF, AFOSR Computer time:

More information

Handbook of Computational Quantum Chemistry. DAVID B. COOK The Department of Chemistry, University of Sheffield

Handbook of Computational Quantum Chemistry. DAVID B. COOK The Department of Chemistry, University of Sheffield Handbook of Computational Quantum Chemistry DAVID B. COOK The Department of Chemistry, University of Sheffield Oxford New York Tokyo OXFORD UNIVERSITY PRESS 1998 CONTENTS 1 Mechanics and molecules 1 1.1

More information

Electron Correlation

Electron Correlation Electron Correlation Levels of QM Theory HΨ=EΨ Born-Oppenheimer approximation Nuclear equation: H n Ψ n =E n Ψ n Electronic equation: H e Ψ e =E e Ψ e Single determinant SCF Semi-empirical methods Correlation

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

University of Denmark, Bldg. 307, DK-2800 Lyngby, Denmark, has been developed at CAMP based on message passing, currently

University of Denmark, Bldg. 307, DK-2800 Lyngby, Denmark, has been developed at CAMP based on message passing, currently Parallel ab-initio molecular dynamics? B. Hammer 1, and Ole H. Nielsen 1 2 1 Center for Atomic-scale Materials Physics (CAMP), Physics Dept., Technical University of Denmark, Bldg. 307, DK-2800 Lyngby,

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of

More information

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL

MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF 2D AND 3D ISING MODEL Journal of Optoelectronics and Advanced Materials Vol. 5, No. 4, December 003, p. 971-976 MONTE CARLO METHODS IN SEQUENTIAL AND PARALLEL COMPUTING OF D AND 3D ISING MODEL M. Diaconu *, R. Puscasu, A. Stancu

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

J S Parker (QUB), Martin Plummer (STFC), H W van der Hart (QUB) Version 1.0, September 29, 2015

J S Parker (QUB), Martin Plummer (STFC), H W van der Hart (QUB) Version 1.0, September 29, 2015 Report on ecse project Performance enhancement in R-matrix with time-dependence (RMT) codes in preparation for application to circular polarised light fields J S Parker (QUB), Martin Plummer (STFC), H

More information

v(r i r j ) = h(r i )+ 1 N

v(r i r j ) = h(r i )+ 1 N Chapter 1 Hartree-Fock Theory 1.1 Formalism For N electrons in an external potential V ext (r), the many-electron Hamiltonian can be written as follows: N H = [ p i i=1 m +V ext(r i )]+ 1 N N v(r i r j

More information

Introduction to Hartree-Fock Molecular Orbital Theory

Introduction to Hartree-Fock Molecular Orbital Theory Introduction to Hartree-Fock Molecular Orbital Theory C. David Sherrill School of Chemistry and Biochemistry Georgia Institute of Technology Origins of Mathematical Modeling in Chemistry Plato (ca. 428-347

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

Exploring the energy landscape

Exploring the energy landscape Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy

More information

Same idea for polyatomics, keep track of identical atom e.g. NH 3 consider only valence electrons F(2s,2p) H(1s)

Same idea for polyatomics, keep track of identical atom e.g. NH 3 consider only valence electrons F(2s,2p) H(1s) XIII 63 Polyatomic bonding -09 -mod, Notes (13) Engel 16-17 Balance: nuclear repulsion, positive e-n attraction, neg. united atom AO ε i applies to all bonding, just more nuclei repulsion biggest at low

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator)

Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator) Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator) Group Representative Hisashi Ishida Japan Atomic Energy Research Institute

More information

Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation

Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation Efficiency of Dynamic Load Balancing Based on Permanent Cells for Parallel Molecular Dynamics Simulation Ryoko Hayashi and Susumu Horiguchi School of Information Science, Japan Advanced Institute of Science

More information

Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale

Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale Fortran Expo: 15 Jun 2012 Conquest order N ab initio Electronic Structure simulation code for quantum mechanical modelling in large scale Lianheng Tong Overview Overview of Conquest project Brief Introduction

More information

DENSITY FUNCTIONAL THEORY FOR NON-THEORISTS JOHN P. PERDEW DEPARTMENTS OF PHYSICS AND CHEMISTRY TEMPLE UNIVERSITY

DENSITY FUNCTIONAL THEORY FOR NON-THEORISTS JOHN P. PERDEW DEPARTMENTS OF PHYSICS AND CHEMISTRY TEMPLE UNIVERSITY DENSITY FUNCTIONAL THEORY FOR NON-THEORISTS JOHN P. PERDEW DEPARTMENTS OF PHYSICS AND CHEMISTRY TEMPLE UNIVERSITY A TUTORIAL FOR PHYSICAL SCIENTISTS WHO MAY OR MAY NOT HATE EQUATIONS AND PROOFS REFERENCES

More information

Chem 4502 Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi. Lecture 28, December 08, 2014

Chem 4502 Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi. Lecture 28, December 08, 2014 Chem 4502 Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi Lecture 28, December 08, 2014 Solved Homework Water, H 2 O, involves 2 hydrogen atoms and an oxygen

More information

Lecture 4: methods and terminology, part II

Lecture 4: methods and terminology, part II So theory guys have got it made in rooms free of pollution. Instead of problems with the reflux, they have only solutions... In other words, experimentalists will likely die of cancer From working hard,

More information

Quantum Chemical Simulations and Descriptors. Dr. Antonio Chana, Dr. Mosè Casalegno

Quantum Chemical Simulations and Descriptors. Dr. Antonio Chana, Dr. Mosè Casalegno Quantum Chemical Simulations and Descriptors Dr. Antonio Chana, Dr. Mosè Casalegno Classical Mechanics: basics It models real-world objects as point particles, objects with negligible size. The motion

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

Intro to ab initio methods

Intro to ab initio methods Lecture 2 Part A Intro to ab initio methods Recommended reading: Leach, Chapters 2 & 3 for QM methods For more QM methods: Essentials of Computational Chemistry by C.J. Cramer, Wiley (2002) 1 ab initio

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding A Parallel Implementation of the Block-GTH algorithm Yuan-Jye Jason Wu y September 2, 1994 Abstract The GTH algorithm is a very accurate direct method for nding the stationary distribution of a nite-state,

More information

Dipartimento di Scienze Matematiche

Dipartimento di Scienze Matematiche Exploiting parallel computing in Discrete Fracture Network simulations: an inherently parallel optimization approach Stefano Berrone stefano.berrone@polito.it Team: Matìas Benedetto, Andrea Borio, Claudio

More information

CHEM3023: Spins, Atoms and Molecules

CHEM3023: Spins, Atoms and Molecules CHEM3023: Spins, Atoms and Molecules Lecture 4 Molecular orbitals C.-K. Skylaris Learning outcomes Be able to manipulate expressions involving spin orbitals and molecular orbitals Be able to write down

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

SHAPE Pilot Albatern: Numerical Simulation of Extremely Large Interconnected Wavenet Arrays

SHAPE Pilot Albatern: Numerical Simulation of Extremely Large Interconnected Wavenet Arrays Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe SHAPE Pilot Albatern: Numerical Simulation of Extremely Large Interconnected Wavenet Arrays William Edwards a bill.edwards@albatern.co.uk,

More information

1. Introductory Examples

1. Introductory Examples 1. Introductory Examples We introduce the concept of the deterministic and stochastic simulation methods. Two problems are provided to explain the methods: the percolation problem, providing an example

More information

JASS Modeling and visualization of molecular dynamic processes

JASS Modeling and visualization of molecular dynamic processes JASS 2009 Konstantin Shefov Modeling and visualization of molecular dynamic processes St Petersburg State University, Physics faculty, Department of Computational Physics Supervisor PhD Stepanova Margarita

More information

Schwarz-type methods and their application in geomechanics

Schwarz-type methods and their application in geomechanics Schwarz-type methods and their application in geomechanics R. Blaheta, O. Jakl, K. Krečmer, J. Starý Institute of Geonics AS CR, Ostrava, Czech Republic E-mail: stary@ugn.cas.cz PDEMAMIP, September 7-11,

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Part III. Cellular Automata Simulation of. Monolayer Surfaces

Part III. Cellular Automata Simulation of. Monolayer Surfaces Part III Cellular Automata Simulation of Monolayer Surfaces The concept of progress acts as a protective mechanism to shield us from the terrors of the future. the words of Paul Muad Dib 193 Chapter 6

More information

Computer simulation methods (2) Dr. Vania Calandrini

Computer simulation methods (2) Dr. Vania Calandrini Computer simulation methods (2) Dr. Vania Calandrini in the previous lecture: time average versus ensemble average MC versus MD simulations equipartition theorem (=> computing T) virial theorem (=> computing

More information

I. CSFs Are Used to Express the Full N-Electron Wavefunction

I. CSFs Are Used to Express the Full N-Electron Wavefunction Chapter 11 One Must be Able to Evaluate the Matrix Elements Among Properly Symmetry Adapted N- Electron Configuration Functions for Any Operator, the Electronic Hamiltonian in Particular. The Slater-Condon

More information

Inverse problems. High-order optimization and parallel computing. Lecture 7

Inverse problems. High-order optimization and parallel computing. Lecture 7 Inverse problems High-order optimization and parallel computing Nikolai Piskunov 2014 Lecture 7 Non-linear least square fit The (conjugate) gradient search has one important problem which often occurs

More information

Introduction to molecular dynamics

Introduction to molecular dynamics 1 Introduction to molecular dynamics Yves Lansac Université François Rabelais, Tours, France Visiting MSE, GIST for the summer Molecular Simulation 2 Molecular simulation is a computational experiment.

More information

Handbook of Computational Quantum Chemistry

Handbook of Computational Quantum Chemistry Handbook of Computational Quantum Chemistry David B. Cook Dept. of Chemistry University of Sheffield DOVER PUBLICATIONS, INC. Mineola, New York F Contents 1 Mechanics and molecules 1 1.1 1.2 1.3 1.4 1.5

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

4 Computational Results from the Cylindrical Model

4 Computational Results from the Cylindrical Model 4 Computational Results from the Cylindrical Model by Lutz Jaitner, August through September, 2018 4.1 The Physical Properties of the Most Stable Configuration The author has programmed a simulator tool

More information

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below Introduction In statistical physics Monte Carlo methods are considered to have started in the Manhattan project (1940

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Chemistry 4560/5560 Molecular Modeling Fall 2014

Chemistry 4560/5560 Molecular Modeling Fall 2014 Final Exam Name:. User s guide: 1. Read questions carefully and make sure you understand them before answering (if not, ask). 2. Answer only the question that is asked, not a different question. 3. Unless

More information

Parallel Eigensolver Performance on the HPCx System

Parallel Eigensolver Performance on the HPCx System Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction

More information

OPENATOM for GW calculations

OPENATOM for GW calculations OPENATOM for GW calculations by OPENATOM developers 1 Introduction The GW method is one of the most accurate ab initio methods for the prediction of electronic band structures. Despite its power, the GW

More information

A Computer Study of Molecular Electronic Structure

A Computer Study of Molecular Electronic Structure A Computer Study of Molecular Electronic Structure The following exercises are designed to give you a brief introduction to some of the types of information that are now readily accessible from electronic

More information

A New Scalable Parallel Algorithm for Fock Matrix Construction

A New Scalable Parallel Algorithm for Fock Matrix Construction A New Scalable Parallel Algorithm for Fock Matrix Construction Xing Liu Aftab Patel Edmond Chow School of Computational Science and Engineering College of Computing, Georgia Institute of Technology Atlanta,

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

ALMA: All-scale predictive design of heat management material structures

ALMA: All-scale predictive design of heat management material structures ALMA: All-scale predictive design of heat management material structures Version Date: 2015.11.13. Last updated 2015.12.02 Purpose of this document: Definition of a data organisation that is applicable

More information

Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers

Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers Jaewoon Jung (RIKEN, RIKEN AICS) Yuji Sugita (RIKEN, RIKEN AICS, RIKEN QBiC, RIKEN ithes) Molecular Dynamics

More information

Gustavus Adolphus College. Lab #5: Computational Chemistry

Gustavus Adolphus College. Lab #5: Computational Chemistry CHE 372 Gustavus Adolphus College Lab #5: Computational Chemistry Introduction In this investigation we will apply the techniques of computational chemistry to several of the molecular systems that we

More information

Fast and accurate Coulomb calculation with Gaussian functions

Fast and accurate Coulomb calculation with Gaussian functions Fast and accurate Coulomb calculation with Gaussian functions László Füsti-Molnár and Jing Kong Q-CHEM Inc., Pittsburgh, Pennysylvania 15213 THE JOURNAL OF CHEMICAL PHYSICS 122, 074108 2005 Received 8

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

This is a very succinct primer intended as supplementary material for an undergraduate course in physical chemistry.

This is a very succinct primer intended as supplementary material for an undergraduate course in physical chemistry. 1 Computational Chemistry (Quantum Chemistry) Primer This is a very succinct primer intended as supplementary material for an undergraduate course in physical chemistry. TABLE OF CONTENTS Methods...1 Basis

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 13 Finite Difference Methods Outline n Ordinary and partial differential equations n Finite difference methods n Vibrating string

More information

Electronic energy optimisation in ONETEP

Electronic energy optimisation in ONETEP Electronic energy optimisation in ONETEP Chris-Kriton Skylaris cks@soton.ac.uk 1 Outline 1. Kohn-Sham calculations Direct energy minimisation versus density mixing 2. ONETEP scheme: optimise both the density

More information

Computational Physics. J. M. Thijssen

Computational Physics. J. M. Thijssen Computational Physics J. M. Thijssen Delft University of Technology CAMBRIDGE UNIVERSITY PRESS Contents Preface xi 1 Introduction 1 1.1 Physics and computational physics 1 1.2 Classical mechanics and statistical

More information

Module 6 1. Density functional theory

Module 6 1. Density functional theory Module 6 1. Density functional theory Updated May 12, 2016 B A DDFT C K A bird s-eye view of density-functional theory Authors: Klaus Capelle G http://arxiv.org/abs/cond-mat/0211443 R https://trac.cc.jyu.fi/projects/toolbox/wiki/dft

More information

IFM Chemistry Computational Chemistry 2010, 7.5 hp LAB2. Computer laboratory exercise 1 (LAB2): Quantum chemical calculations

IFM Chemistry Computational Chemistry 2010, 7.5 hp LAB2. Computer laboratory exercise 1 (LAB2): Quantum chemical calculations Computer laboratory exercise 1 (LAB2): Quantum chemical calculations Introduction: The objective of the second computer laboratory exercise is to get acquainted with a program for performing quantum chemical

More information

Introduction to Electronic Structure Theory

Introduction to Electronic Structure Theory Introduction to Electronic Structure Theory C. David Sherrill School of Chemistry and Biochemistry Georgia Institute of Technology June 2002 Last Revised: June 2003 1 Introduction The purpose of these

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

Chemistry 334 Part 2: Computational Quantum Chemistry

Chemistry 334 Part 2: Computational Quantum Chemistry Chemistry 334 Part 2: Computational Quantum Chemistry 1. Definition Louis Scudiero, Ben Shepler and Kirk Peterson Washington State University January 2006 Computational chemistry is an area of theoretical

More information

Solid State Theory: Band Structure Methods

Solid State Theory: Band Structure Methods Solid State Theory: Band Structure Methods Lilia Boeri Wed., 11:15-12:45 HS P3 (PH02112) http://itp.tugraz.at/lv/boeri/ele/ Plan of the Lecture: DFT1+2: Hohenberg-Kohn Theorem and Kohn and Sham equations.

More information

Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi. Lecture 27, December 5, 2014

Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi. Lecture 27, December 5, 2014 Chem 4502 Introduction to Quantum Mechanics and Spectroscopy 3 Credits Fall Semester 2014 Laura Gagliardi Lecture 27, December 5, 2014 (Some material in this lecture has been adapted from Cramer, C. J.

More information

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of

More information

Introduction to Density Functional Theory with Applications to Graphene Branislav K. Nikolić

Introduction to Density Functional Theory with Applications to Graphene Branislav K. Nikolić Introduction to Density Functional Theory with Applications to Graphene Branislav K. Nikolić Department of Physics and Astronomy, University of Delaware, Newark, DE 19716, U.S.A. http://wiki.physics.udel.edu/phys824

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Calculations of band structures

Calculations of band structures Chemistry and Physics at Albany Planning for the Future Calculations of band structures using wave-function based correlation methods Elke Pahl Centre of Theoretical Chemistry and Physics Institute of

More information

Introduction to Computational Chemistry Computational (chemistry education) and/or. (Computational chemistry) education

Introduction to Computational Chemistry Computational (chemistry education) and/or. (Computational chemistry) education Introduction to Computational Chemistry Computational (chemistry education) and/or (Computational chemistry) education First one: Use computational tools to help increase student understanding of material

More information

2 Electronic structure theory

2 Electronic structure theory Electronic structure theory. Generalities.. Born-Oppenheimer approximation revisited In Sec..3 (lecture 3) the Born-Oppenheimer approximation was introduced (see also, for instance, [Tannor.]). We are

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Zacros. Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis

Zacros. Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis Zacros Software Package Development: Pushing the Frontiers of Kinetic Monte Carlo Simulation in Catalysis Jens H Nielsen, Mayeul D'Avezac, James Hetherington & Michail Stamatakis Introduction to Zacros

More information

On the adaptive finite element analysis of the Kohn-Sham equations

On the adaptive finite element analysis of the Kohn-Sham equations On the adaptive finite element analysis of the Kohn-Sham equations Denis Davydov, Toby Young, Paul Steinmann Denis Davydov, LTM, Erlangen, Germany August 2015 Denis Davydov, LTM, Erlangen, Germany College

More information

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers ANSHUL GUPTA and FRED G. GUSTAVSON IBM T. J. Watson Research Center MAHESH JOSHI

More information

Electronic structure theory: Fundamentals to frontiers. 1. Hartree-Fock theory

Electronic structure theory: Fundamentals to frontiers. 1. Hartree-Fock theory Electronic structure theory: Fundamentals to frontiers. 1. Hartree-Fock theory MARTIN HEAD-GORDON, Department of Chemistry, University of California, and Chemical Sciences Division, Lawrence Berkeley National

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Introduction to Parallelism in CASTEP

Introduction to Parallelism in CASTEP to ism in CASTEP Stewart Clark Band University of Durham 21 September 2012 Solve for all the bands/electrons (Band-) Band CASTEP solves the Kohn-Sham equations for electrons in a periodic array of nuclei:

More information

k θ (θ θ 0 ) 2 angles r i j r i j

k θ (θ θ 0 ) 2 angles r i j r i j 1 Force fields 1.1 Introduction The term force field is slightly misleading, since it refers to the parameters of the potential used to calculate the forces (via gradient) in molecular dynamics simulations.

More information

Introduction to numerical projects

Introduction to numerical projects Introduction to numerical projects Here follows a brief recipe and recommendation on how to write a report for each project. Give a short description of the nature of the problem and the eventual numerical

More information