AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS. Milind M Malshe

Size: px

Start display at page:

Download "AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS. Milind M Malshe"

Frank Franklin
6 years ago
Views:

1 AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS By Milind M Malshe Bachelor of Engineering University of Mubai Mubai, India 998 Subitted to the Faculty of the Graduate College of the Oklahoa State University in partial fulfillent of the requireents for the Degree of MASTER of SCIENCE Deceber, 004

2 AB INITIO MOLECULAR DYNAMICS (AIMD) A NEW APPROACH FOR DEVELOPMENT OF ACCURATE POTENTIALS Thesis Approved: Dr. R. Koanduri Thesis Advisor Dr. S. Roy Dr. H. Lu Dr. M. Hagan Dr. L. M. Raff Dr. A. Gordon Eslie Dean of the Graduate College ii

3 3 SUMMARY In this study a new approach is presented for the developent of accurate potentialenergy hypersurfaces based on ab initio calculations that can be utilized to conduct olecular dynaics and Monte Carlo siulations to study cheical and echanical properties at the atoistic level. The ethod integrates ab initio electronic structure calculations with the interpolation capability of ultilayer neural networks. A sapling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the siulation. The procedure can be initiated using an epirical potential or direct dynaics siulation. The procedure is applied for developing the potential energy hypersurface for 5-ato clusters within a silicon workpiece. Ab initio calculations were perfored using Gaussian 98 electronic structure progra. Results for 5-ato silicon clusters representing the bulk and the surface structure are presented. iii

4 4 ACKNOWLEDGEMENTS I would like to express y gratitude to y parents and brother for their support, encourageent, love and confidence in e. I would like to thank y advisor, Dr. R. Koanduri, for his inspiration, technical guidance, and financial support. I wish to express y sincere appreciation to Dr. L. M. Raff for introducing e to the concept of Molecular Dynaics and Monte Carlo siulations and for providing guidance and encourageent throughout the study. I would like to thank Dr. M. Hagan for introducing e to the field of Neural Networks and for his invaluable guidance and encourageent. I would like to thank Dr. P. M. Agrawal for his guidance in understanding various aspects of siulation techniques. This project is a result of coordinated effort linking siulation ethods and neural networks. I would also like to thank Dr. S. Roy and Dr. H. Lu for their suggestions in this study. I grateful to Dr. R. Koanduri and y coittee ebers for helping e to explore and develop new ethods for iproving siulation techniques. This project is funded by grant DMI fro the National Science Foundation. I thank Dr. W. DeVries, Dr. G. Hazelrigg, Dr. J. Chen, and Dr. D. Durha of the Division of Design, Manufacturing, and Industrial Innovation, Dr. B. M. Kraer, Engineering Centers Division, and Dr. J. Larsen Basse, Tribology and Surface Engineering progra, for their interest and support of this work. This project was also sponsored by a DEPSCoR grant on the Multiscale Modeling and Siulation of Material iv

5 Processing (F ). I thank Dr. Craig S. Hartley of the AFOSR for his interest in and support of this work. v

6 TABLE OF CONTENTS Introduction... Literature Review Literature review of design of interatoic potentials Literature review of quantu echanical calculations of silicon clusters Molecular Dynaics Siulation Brief history of olecular dynaics siulation Applicability of MD siulations Forulation of the differential equations of otion Tie integration algoriths Verlet algorith Leap-frog Verlet algorith Velocity Verlet algorith Beean algorith Methods to iprove coputational speed Neighbor list and book-keeping techniques Cell structures and link lists Periodic boundary conditions (PBC) Interatoic Potentials Design of inter-atoic potentials... 3 vi

7 4. Classification of potential energy functions Pair potentials Many-body potentials Ab Initio Calculations Born-Oppenheier approxiation Basis sets Slater-type orbitals (STO) Gaussian-type orbitals (GTO) Types of basis sets Hartee-Fock ethod Electron correlation ethods Requireents of electron correlation theories Configuration interaction (CI) Liited configuration interaction Møller-Plesset Perturbation theory (MP) Density functional theory (DFT) Neural Networks Biological neurons History of ultilayer neural networks Neural networks versus conventional coputers Model of a neuron Multilayer network Transfer functions vii

8 6.7 Neural network training Supervised training and paraeter optiization LMS (Least Mean Squared Error) algorith Backpropagation algorith Nuerical optiization techniques Conjugate gradient ethods Newton and quasi-newton ethods Levenberg-Marquardt algorith Bias/variance dilea Neural network growing and neural network pruning Regularization and early stopping Early stopping Noralizing the data Neural network derivatives First derivative with respect to weights Derivatives of the transfer function First derivative with respect to inputs Novelty detection Miniu distance coputation Ab Initio Molecular Dynaics (AIMD) Approach Choice of the coordinate syste Internal coordinates viii

9 7.3 Input vector to the neural network Sorting and ordering the input vector for neural network training Procedure for sorting and ordering the input vector for AIMD Calculation of forces during the MD siulation Sapling technique Novelty detection... 8 Results and Disscussion Ab initio calculations Neural network training Iterating for the neural network convergence Neural network convergence using different epirical potentials A ethod to obtain a conservative force field Neural network training for surface and bulk configurations Neural network training for ab initio energies in carbon clusters of Buckyball (C 60 ) and carbon nanotube Conclusions and Future Work References Appendix ix

10 5 LIST OF FIGURES Fig. The geoetries for the neutral and ionic clusters of Si to Si Fig. Scaled cohesive energy vs cluster size for neutral silicon clusters calculated at the MP4/6-3G* level... 5 Fig.3 Increental binding energy vs cluster size for neutral silicon clusters calculated at the HF/6-3G* and MP4/6-3G* levels... 6 Fig 3. Cell index ethod... 6 Fig 3. Representation of periodic boundary condition (PBC) and the siulation cell... 7 Fig 5. Description of split valence basis set Fig 5. Variation of energy with respect to radius Fig 6. Scheatic of biological neuron Fig 6. Synaptic gap Fig 6.3 Model of a neuron Fig 6.4 Single input neuron Fig 6.5 Multilayer network Fig 6.6 Sigoid transfer functions Fig 6.7 Evaluation of error during training Fig 6.8 Plot of neural network output for f(x) = x Fig 7. Internal coordinates a) bond distance, b) bond angle, and c) dihedral angle Fig 7. Sign convention for dihedral angle... 0 Fig 7.3 Configuration of 5 silicon atos... 0 Fig 7.4 Contour plot of a function f(x,y) = x +y x

11 Fig 7.5 Effect of ordering of atos in a configuration Fig 8. Silicon cluster... 5 Fig 8. Coparison of neural network output with ab initio energy for the training set0 Fig 8.3 Coparison of neural network output with ab initio energy for the validation set... Fig 8.4 Coparison of neural network output with ab initio energy for the testing set. Fig 8.5 Distribution of iniu distances N( ) for the initial set of Si 5 configurations... Fig 8.6 Distribution of iniu distances N( ) of the configurations in first iteration fro initial set configurations... 3 Fig 8.7 Coparison of distribution of iniu distances N( ) of the configurations in first iteration fro initial configurations with the distribution of configurations fro Tersoff potential... 4 Fig 8.8 Distribution of recoputed iniu distances of concatenated set of configurations in the initial and first iteration... 5 Fig 8.9 Parzen s distribution of iniu distances for the initial set of configurations fro the Tersoff potential... 6 Fig 8.0 Parzen s distribution of iniu distances for the configurations stored while using neural network trained for ab initio energy during MD siulation... 7 Fig 8. Distribution of iniu distances of the configurations during second iteration fro configurations during the first iteration... 8 Fig 8. Parzen s distribution of the iniu distances for the concatenated set of configurations in the initial and first iteration... 9 xi

12 Fig 8.3 Distribution of iniu distances of all configurations stored using odified Tersoff potential Fig 8.4 Coparison of Tersoff and odified Tersoff potential energies (ev)... 3 Fig 8.5 Coparison of neural network output energy (ev) for... 3 Fig 8.7 Variation of the total energy Fig 8.8 Variation of the potential energy Fig 8.9 Phonon frequencies for silicon Fig 8.0 Coparison of neural network output with ab initio energy for the training set Fig 8. Coparison of neural network output with ab initio energy for the validation set Fig 8. Coparison of neural network output with ab initio energy for the testing set39 Fig 8.3 Coparison of neural network output with ab initio energy for the training set... 4 xii

13 6 7 8 CHAPTER Introduction Coputer siulation techniques play an iportant role in the study of various cheical, biological and echanical processes at the atoistic level. By the coparing results of the siulations with experiental observations, theoretical odel used in the siulation can be corrected and validated. Once validated, the siulations can be used to study the syste under different conditions for which experiental results are not easily available. By analyzing the results fro the siulations, experients can then be perfored under specific conditions to achieve desired results. Thus coputer siulations can copleent both theoretical and experiental approaches. In conjunction with the developents in coputer technology, the use of siulations has greatly extended the range of probles that can be studied. In cheistry, siulations are perfored to study reaction dynaics. In engineering, siulations are used to study the behavior of a aterial under varying conditions in various processes such as achining, indentation, and elting. Such siulations can provide a useful insight into iportant echaniss such as phase transforation and dislocation dynaics, which otherwise is not easily understood using other techniques. Siulation techniques can be broadly classified into two groups as, stochastic and deterinistic. Stochastic siulations are based on statistical echanics which relates

14 acroscopic properties such as pressure and teperature to the distribution and otion of atos and olecules, whereas deterinistic approach is based on classical echanics. Exaples of deterinistic and stochastic approaches are olecular dynaics and Monte Carlo siulations respectively. Molecular dynaics siulations generate inforation at the icroscopic level, such as atoic positions and velocities, by integrating the equations of otions and following the tie evolution of a set of interacting atos. Statistical echanics deals with acroscopic systes fro a olecular point of view, where acroscopic phenoenons are studied fro the properties of individual olecules aking up the syste. The ost iportant part of MD/MC siulation is the developent of the potential energy surfaces, which describe the interactions of atos within a syste. It should provide a realistic description of the interatoic interactions to atch the experiental observations. The usual approach for developing a potential is to deterine a functional for otivated by physical intuition and then to adjust the paraeters either to ab initio data and/or soe physical properties to coe up with an epirical potential. Although, such epirical potentials provide a siple and physically interpretable description for the interatoic interactions their applicability is liited to the type of data to which it was fitted. Once fitted, there is no easy way to iprove upon it, without refitting the data. A solution to this proble is to odel the syste using ab initio electronic structure calculations by solving Schrödinger s equation, to copute the potential energy and forces fro the first principles. Such a technique can provide very accurate description of the interatoic interactions, but are coputationally very extensive and hence liited only to sall systes involving only a few atos.

15 In this study a new approach is presented for the developent of accurate potential-energy hypersurfaces based on ab initio calculations that can be utilized to conduct olecular dynaics and Monte Carlo siulations to study cheical, echanical, and electrical properties at the atoistic level. The ethod integrates ab initio electronic structure calculations with the interpolation capability of ultilayer neural networks. A sapling technique based on novelty detection is also developed to ensure that the neural network fitting for the potential energy spans the entire configuration space involved during the siulation. Chapter gives a literature review of various epirical potentials and different design approaches eployed for odeling different situations. Chapter 3 explains the olecular dynaics technique, and various integration algoriths and ipleentation of periodic boundary conditions. Chapter 4 gives the classification and design of epirical potentials and details about soe of the iportant epirical potentials developed for covalent aterials, particularly silicon. Chapter 5 provides a detailed description of ab initio techniques. Different types of basis sets and techniques, such as Hartree-Fock ethod, electron correlation ethods, such as configuration interaction, perturbation theory and density functional theory are discussed. Chapter 6 discusses ultilayer neural networks, training algoriths and techniques to achieve an accurate fit. A sapling technique called novelty detection is also discussed. Chapter 7 presents the new technique for ab initio Molecular Dynaics, and discusses various steps involved. Chapter 8 shows the results for the silicon syste. Chapter 9 suarizes the conclusions of this study. 3

16 9 0 CHAPTER Literature Review The ost iportant part of the MD/MC siulation is the developent of the potential energy surfaces, which can odel the syste under the consideration sufficiently close to the actual one. In recent years any epirical potentials for silicon such as Stillinger-Weber potential, Tersoff potential, Bolding-Anderson potential and Brenner potential have been developed and applied to nuber of different systes.. Literature review of design of interatoic potentials Existing odels for interatoic differ in the degree of sophistication, functional for, fitting strategy and the range of applicability where each one is designed for a specific proble. Not only surfaces and sall clusters are difficult to odel (Balaane et al, 99; Kaxiras, 996) even bulk aterial, including crystalline and aorphous phases, solid defects and liquid phase have not provided a transferable description by a single potential. The usual approach for developing an epirical potential is to arrive at a paraeterized functional for otivated by physical intuition and then to find a set of paraeters by fitting to either ab initio data or experiental observation, or both for various atoic structures. A covalent aterial presents a difficult challenge because coplex quantu echanical effects, such as cheical bond foration, hybrdization, etallization, charge transfer and bond bending ust be described by an effective 4

17 interaction between atos. In spite of a wide range of functional fors and fitting strategies ost of these odels have been successful in the regie for which they were paraeterised, but have shown a lack of transferability. Balaane and Halicioglu ( 99) perfored a coparative study of six classical any-body potentials naely, Pearson, Takai, Halicioglu and Tiller potential; Biswas and Haann potential; Stillinger and Weber potential; Dodson potential; Tersoff potential (T); odified Tersoff potential (T3). They concluded that each has strengths and liitations, none appearing clearly superior to the others, and none being fully transferable. The Stillinger-Weber potential (985) was paraeterized for experiental properties of solid and liquid silicon such as lattice energy and elting point. The odel has a for of a third order cluster potential (Carlson, 990) in which the potential energy is expressed as a linear cobination of two and three-body ters, as: E = i i, < j j v i, j) + v3 ( i, j, k) i, j, k i< j< k (, (. ) where v (i,j) represents two-body interactions and v 3 (i,j,k) represents three-body interactions. The two-body interactions are expressed as : and v ( r ) = f ( r / ), (.) f ij ( ) ( ) p q ra r e, = A B r ij where has energy units and has length units. The three-body interaction is expressed as a separable product of radial functions g(r) and angular function h(). Thus: v3 ( i, j, k) = g( rij ) g( r ik ) cos( ijk ) +, ijk 3 (.3 ) 5

18 where ijk is the bond angle fored by the bonds ij and ik, and g(r) is a decaying function. The angular function has a iniu at = 09.5, i.e. the tetrahedral angle to represent the angular preference for sp 3 bonding. This potential gives a fairly realistic description of crystalline silicon, point defects, liquid and aorphous phases but provides inadequate description of under-coordinated or over-coordinated structures. Mistriotis et al. (989) proposed an iproveent over Stillinger-Weber potential which included four-body interactions as well. It was able to give good agreeent with the elting point of the crystal and the geoetries and the energies of the ground and low etastable states of silicon clusters. Tersoff (988; 989) proposed an approach by coupling two-body and ulti-body correlations into the odel. The potential is based on bond order which takes into account local environent. The bond strength depends on the geoetry- the ore neighbors an ato has, the weaker the bond to each neighbor would be. The energy is the su of repulsive and attractive interactions which depends on the local bonding environent. Thus: E = and i, j i j ij f C = ( r ij ) 3 i,, k j j i k i, k j [ f ( i, j) + b ( ) f ( i, j) ] R v ( i, j, k). ij, (.4 ) The f R and f A ters are repulsive and attractive pair potentials, respectively and f C is a cutoff function. The ain feature of this for is the ter b ij. The idea is that the strength of each bond depends upon the local environent and is lowered when the nuber of neighbors is relatively high. The ter ij defines the effective coordination nuber of ij A 6

19 ato taking into account the relative distance of two neighbors (r ij -r ik ) and the bond angle ijk. The Bolding-Anderson potential (990) is a general fro of Tersoff potential, in which the interaction between pair of atos dependent on the environent. They suggested that clusters of four or ore atos have local inia on the potential energy surface corresponding to ultiple configurations, which involve atos with high coordination nuber and strained bonds angles. The increase in potential because of strained bond angles is offset by the large nuber of weak bonds fored. Their potential incorporated angle dependence that is different for clusters and crystals by introducing interference functions. Bazant and Kaxiras (997) developed environent dependent interatoic potential (EDIP) for bulk silicon. The interaction odel included two-body, three-body ters which depend on the local atoic environent through an effective coordination nuber given by: V (r,z) = R (r) + p(z) A (r), where R (r) represents the short range repulsion of atos, A (r) represents the attractive force of bond foration, and p(z) is the bond order, which deterines the strength of the attraction as a function of the atoic environent, easured by the coordination Z. The explanation of the bond order is that, as an ato becoes overcoordinated beyond the ideal coordination beyond its valence, the bonds becoe ore etallic, characterized by delocalized electrons (Carlsson, 985; 990; Abell, 985; Pettifor, 990). They provided a test of epirical theories of covalent bonding in solids using a procedure to invert ab initio cohesive energy curves. They showed that the inversion of ab initio cohesive 7

20 energy curves verified trends in cheical bonding across various bulk bonding arrangeents consistent with theoretical predictions (Bazant and Kaxiras, 996). Their results indicated that a coordination dependent pair interaction can provide a fair description of high syetry crystal structures without requiring additional any-body interactions, and angular forces are needed only to stabilize the structure under syetry breaking distortion, priarily for sall coordinations. In spite of various approaches, no potential has deonstrated a transferable description of olecule in all its fors leading to the conclusion that it ay be too abitious to attept a siultaneous fit of all of the iportant atoic structures (bulk, crystalline, surface, aorphous, liquid phase and clusters), since qualitatively different aspects of bonding are at work in different types of structures. Rahan and Raff (00) presented analytical fitting to the ab initio data for dissociation dynaics of vinyl broide. Their approach was to develop analytic functions for each of the energetically open reaction channels using functional fors suggested by physical and cheical considerations. The paraeters of these functions are expressed as appropriate functions of syste s geoetry. These channel potentials are connected soothly with paraeterized switching functions that play a central role in fitting the potential barriers, the reaction coordinate curvatures, and the coordinate coupling ters obtained by the ab initio calculations. Ischtwan and Collins (994) have developed a oving interpolation technique in which the potential energy in the neighborhood of any point is approxiated by Taylor s series using inverse bond length coordinates as the expansion variables. The data fro ab initio calculations is used to obtain a set of initial Taylor series expansions. The 8

21 procedure expresses the potential at any configuration as a weighed average of the Taylor series about all points in the data set. Subsequently, the results are iteratively iproved by coputing trajectories on the Taylor series fitted surface and the internal coordinates are stored. These new points are added to the data set according to a weight factor that is deterined by the relative density of points in the data set in the region of the new point. The criterion used to assess convergence of the iteratively iproved potential to the final surface was coputed using dynaic variable, such as reaction probability. Jordan et al. (995) used the oving interpolation trajectory sapling ethod to investigate the reaction dynaics. Maisuradze et al. (003) introduced an interpolating oving least squares (IMLS) ethod for the interpolation between the coputed ab initio points. When the ethod is unrestricted, the least squares coefficients are obtained fro the solution of a large atrix equation that ust be solved repeatedly during a trajectory study. Another approach involves bypassing the physical otivation behind the functional for of the potential in favor of elaborate fitting as described by Ercolessi and Adas, (994). The potential involves cobinations of cubic splines with effectively hundreds of adjustable paraeters and the approach was force atching ethod, which involves fitting the potential to ab initio atoic forces of any atoic configurations including surfaces, clusters, liquids, and crystals. Potential for is defined by single variable functions- atoic coordinates, such as glue potential which is defined as: V = ( r + ij ) U ( rij ), (.5 ) i, j i j where r ) is a pair potential, r ) is a function of atoic density and U(n) is the glue ( ij ( ij function. They attepted to atch the forces fro first principle calculations for a large 9

22 set of configurations with those predicted by a classical potential, by iniizing the objective function, naely, Z ( ) = Z F ( ) + Z ( ), (.6 ) C where Z F contains ab initio data and Z C consists of physical quantities fro experiental results. If the ephasis is given to Z F, then the iniizing potential appears as the best approxiation of the first principles syste, with Z C acting as a guide towards the correct region in the space of paraeters. On the other hand, if the ephasis is on Z C, then the ethod looks like conventional fit, but the Z F ter relieves the burden of guessing the functional for. Another approach utilizes genetic algoriths and genetic prograing. The idea is to let the coputer progra deterine the functional for and coe up with the best possible solution with inial huan input. The concept is based on genetics and evolution theory, in which different genetic representations of a set of variables are cobined to produce a better solution. The technique is stochastic in nature rather than conventional gradient based approaches and hence the diensionality of the proble does not pose a serious liitation. It works by randoly generating and exchanging functional eleents (variables and arithetic operators, such as addition and ultiplication), and selecting the fittest individuals (which represent a set of paraeters) assessed by coparing with target values, a population of potential evolves until a superior for eerges. Makarov and Metiu (998) proposed a procedure by which genetic prograing could be used to find the best functional for and the best set of paraeters to fit the energy and derivatives fro ab initio calculations. They suggested directed search which guides the coputer to use a specified functional for without 0

23 liiting too uch on the flexibility of the search. Directed search was used to fill in portions of huan engineered functional teplate, without which the coputer could not find a siple interpretable for even for a pair potential. As an alternative to these ethods Blank et al. (993; 995) explored an interpolation ethod based on feed forward neural networks. Functional approxiation using neural networks does not require any assuption regarding the functional for of the potential. Multilayer feed forward neural networks could fit in principle to any real valued continuous function of any diension to arbitrary accuracy provided it has sufficient nuber of neurons (Cybenko, 989). Blank et al. used neural network to fit data sets produced by low diensional odels of CO olecule cheisorbed on a Ni() surface. The odels were constrained versions of any diensional epirical potential that had been used in the siulation of surface dynaics where it had been shown to give a reliable qualitative picture of the olecule- surface interaction. Hobday et al. (999) developed a neural network odel for ore coplex C-N syste for which the potential energy function was not paraeterized. They used the neural network to fit the any-body ter in the Tersoff functional for rather than fitting the entire potential. In contrast to the conventional network architecture, where a single input vector is presented to network, a set of N vectors where N is the nuber of neighbors of the two atos i and j, not including i and j. The value of N depends on the i- j bond. For exaple if the i-j bond is a part of onocyclic chain, then N=. If the i-j bond is tetrahedrally coordinated, then N=6. To present the local environent to the network, the bond order ter is expressed as a function of i-j-k contributions, where k is the first neighbor of i or j. The input network consists of N vectors, where N is the nuber of

24 different i-j-k contributions. Each vector of the input data consisted of pairwise inforation, such as direction cosines, bond lengths, cutoff functions, and first neighbor inforation, such as its bond lengths and torsional angles and second neighbor inforation, such as the nuber of types of atos used to describe the i-j-k atos triplet. The size of the input vector was constant, but the nuber of input vectors N is a variable which could change during the siulation. Car and Parrinello (985) developed a unified approach for olecular dynaics and density functional theory. In this ethod, the ions are still oved classically but under the action of forces obtained by solving the electronic structure proble, thus eliinating the need for epirical potentials at the expense of uch larger coputational requireents. This technique is known as Car-Parrinello olecular dynaics ethod. The electron density in ters of occupied single particle orthonoral orbitals is written as: n ( r) = i ( r). (.7 ) i A point on the Born-Oppenheier potential energy surface is given by the iniu with respect to i of the energy functional. In the conventional forulation, iniization of the energy functional with respect to orbitals i, subject to the orthonorality constraint leads to the self consistent Kohn-Sha equations. Its solution involves repeated atrix diagonalization. Car and Parrinello (985) adopted a different approach by regarding the iniization of the Kohn-Sha (965) functional as a coplex optiization proble, which could be solved by applying optiization ethod called siulated annealing introduced by Kirkpatrick et al. (983). In this approach, an objective function is iniized relative to the paraeters with Boltzan type probability distribution via a

25 Monte Carlo procedure. In Car-Parrinello ethod the objective function is the total energy functional and the variational paraeters are the coefficients of the expansion of Kohn-Sha orbitals. They found the siulated annealing strategy based on the MD, rather than on Metropolis Monte Carlo of Kirkpatrick et al. (983) could be applied efficiently to iniize the Kohn-Sha functional. This technique called dynaical siulated annealing was found to be useful to study finite teperature properties.. Literature review of quantu echanical calculations of silicon clusters Raghavachari (985; 986; 988) investigated the geoetries and energies of silicon clusters using ab initio calculations. He considered several geoetrical arrangeents and electronic states. All geoetries considered were optiized with several basis sets (inial basis set- STO-3G, 6-3G) at Hartree-Fock level of theory and incase of open shell species (triplets, quintets, etc.) unrestricted Hartree-Fock was used. Then single point calculations were perfored at these geoetries using coplete fourth order perturbation (MP4) theory with the polarized 6-3G* basis set. For the clusters of five silicon atos, the geoetries considered were- trigonal bipyraid, square pyraid, planar pentagon, linear structure, and the tetrahedral structure (which is the equilibriu structure for the bulk). He found the trigonal bipyraid with a singlet state to be the ost stable structure of all the structures considered. In the case of square pyraid ato fors four equivalent bonds, all on one side of a plane which is not a stable structure and the olecule prefers to rearrange to ore stable triangular bipyraid structure in spite of lower nuber of bonds present. They found the tetrahedral structure of five silicon atos to have high energy. Though the central ato in such a cluster has 3

26 four bonds oriented in tetrahedral directions, the reaining four silicon atos have only one bond each and are too distant fro each other to interact favorably. Fig. The geoetries for the neutral and ionic clusters of Si to Si 6, (Raghavachari, 985). (Nubers in parentheses indicate bond lengths and bond angles for the ions). Fig.shows the ground state equilibriu structures for the neutral and ionic structures of Si to Si 6. Consideration of the structures Si 3 to Si 7 revealed that each cluster Si n can be built fro a saller cluster Si n- by addition of a silicon ato at an appropriate edge or face capped bonding site. Edge capped structures were found to be favored in the case of saller clusters while the face capped clusters becae coparable in energy for the interediate clusters. For exaple the rhobus structure of Si 4 could be considered as an edge capped triangular for, which is stable copared to the face capped tetrahedron. Many atos in larger clusters Si 7 to Si 0 were found to have a coordination nuber of 6 (greater than 4) for the bulk. It indicated that unlike a close-packed etallic syste there ay be saturation of coordination at 6 and further bonding caused overcrowding and destabilization. He concluded that the principal differences between these clusters and the bulk were due to large fraction of surface atos in the clusters. 4

27 These surface atos provide a large driving force (analogous to the surface tension) to for ore copact structures provided the resulting strain energy is not too high. In this sense there was a correspondence between the local coordination found in clusters and high pressure for of silicon (-tin structure) which has hexacoordinate silicon in octahedral environent. Fig. Scaled cohesive energy vs. cluster size for neutral silicon clusters calculated at the MP4/6-3G* level, (Raghavachari, 985). Fig.shows the scaled cohesive energy as a function of the size of the cluster. The cohesive energy generally increases with soe indication of special stability at cluster sizes 4, 7 and 0. Si 7 and Si 0 approach the bulk cohesive energy ore closely. Saller clusters have low cohesive energy because of the difference in the nuber of bonds between the clusters and the bulk and not because of the nature of bonding. The local relative stabilities of different size clusters could be copared by the increental binding energy (which is defined as the energy required in the reaction Si n Si n- +Si). Fig.3 shows an increental binding energy as a function of the size of the cluster at HF/6-3G* and MP4/6-3G* levels. 5

28 Fig.3 Increental binding energy vs. cluster size for neutral silicon clusters calculated at the HF/6-3G* and MP4/6-3G* levels (Raghavachari and Rohlfing, 988). The peaks at cluster sizes of 4, 6, 7 and 0 indicate that these structures are locally stable, consistent with the proinent presence of 6, 7 and 0 in the ass spectral distributions of these clusters. 6

29 3 4 CHAPTER 3 3 Molecular Dynaics Siulation Molecular Dynaics is a coputer siulation technique where the tie evolution of a set of interacting atos is followed by integrating the equations of otions. While the continuu echanics approach provides a better understanding of the processes at the acro and icro regie, it cannot be ipleented to siulate the processes at the atoistic regie. Unlike in finite eleent analysis or other continuu approaches, where the nodes and distances are selected arbitrarily, in olecular dynaics it is based on ore fundaental unit for a specific aterial such as: lattice spacing and inter-atoic distances. Thus the processes can be studied at the fundaental level. However, since a large nuber of atos constitute any coon aterial, one has to consider the interactions of several thousands of atos even for nanoetric study. Such siulations require large aount of coputing resources but in return can provide an insight into processes happening at the sallest level, naely, atoistic level. 3. Brief history of olecular dynaics siulation The olecular dynaics ethod was first introduced by Alder and Wainwright in the late 950's (Alder and Wainwright, 957, 959) to study the interactions of hard spheres. The purpose of the paper was to investigate the phase diagra of a hard sphere syste and in particular the solid and liquid regions. In a hard sphere syste, particles 7

30 interact via instantaneous collisions, and travel as free particles between collisions. Gibson et al. (960) studied the dynaics of radiation daage. It was the first exaple of olecular dynaics calculation with a continuous potential based on a finite tie integration ethod. Rahan (964) carried out perhaps the first siulation using a realistic potential for liquid argon. He studied properties of liquid argon using Lennard- Jones potential. Rahan and Stillinger (974) carried out siulation of liquid water. Verlet calculated the phase diagra of argon using the Lennard-Jones potential, and coputed correlation functions to test theories of the liquid state. The bookkeeping device, which becae known as Verlet neighbor list was also introduced. The Verlet tie integration algorith was used. Phase transitions in the sae syste were investigated by Hansen and Verlet (969). The nuber of siulation techniques has expanded since then; there exist now any specialized techniques for particular probles, including ixed quantu echanical - classical siulations. 3. Applicability of MD siulations Molecular dynaics follows the laws of classical echanics. At the atoistic level, the syste follows laws of quantu echanics rather than classical echanics. A classical description of the syste involved can be used, when quantu effects, such as tunneling, and electronic excitations, play no essential role in the dynaics. In addition to this, the kinetic energy of a particle ust be large enough, to ensure that the de Broglie wavelength is uch saller than the lattice constant of the solid. A siple test of the validity of the classical approxiation is based on the de Broglie theral wavelength defined as: 8

31 h (3.) =, M k B T where M is the atoic ass and T is the teperature. The classical approxiation is justified if << a, where a is the ean nearest neighbor distance. The classical approxiation is poor for lighter eleents, such as hydrogen, heliu. Moreover, quantu effects becoe iportant in any syste when T is sufficiently low. The drop in the specific heat of crystals below the Debye teperature or the anoalous behavior of the theral expansion coefficient, are well known exaples of easurable quantu effects in solids. Hence, olecular dynaics results should be interpreted with caution in these regions. An iportant issue of MD siulations is the teporal scale that can be explored. The axiu tie step with which Newton s equations can be integrated nuerically is inherently restricted to sall values (about fs), in order to reproduce atoic vibrations accurately. As a result olecular dynaics siulations require long coputational ties because the ost interesting otions occurring during the siulation processes are very slow copared with the fast oscillations of bond lengths and bond angles that liit the integration tie step. This liits the tie scales that can be handled by MD in reasonable aount of actual clock tie. MD siulations are usually liited to olecular processes happening in relatively short ties, the so called rare events phenoenon in which infrequent processes are copleted quickly, such as dissociation of a cheical bond which is a rapid process that is copleted on the pico-second tie scale. However the application of external load is longer by orders of agnitude copared with the dissociation tie, aking it difficult to observe such events at experiental speeds. In order to overcoe the tie scale liitation of MD, two approaches are usually followed. 9

32 In the first approach coarse-graining, the dynaic variables are divided into slow and fast degrees of freedo and averaging is perfored on the fast degrees of freedo. In the second approach fast (and rare) trajectories between desired states are coputed explicitly, and their probabilities (relative to non-reactive trajectories) are calculated. 3.3 Forulation of the differential equations of otion The forces between the atos are coputed explicitly and the otion of the olecule is followed over a period of tie using a suitable nuerical integration ethod. Classical echanics describes how physical objects ove and how their positions change with tie. Consider an isolated syste consisting of N bodies with coordinates (x i, y i, z i ) where i=,, 3 N (Goldstein, 965). By isolated syste it eans that all other bodies are sufficiently reote to have a negligible influence on the syste. Each of the N bodies is assued to be sall enough to be treated as a point particle. The position of the i th body with respect to a given inertial frae is denoted as r i (t). Its velocity and acceleration are given by v i (t) = & ( t) and a i (t) = r& i (t). Each ato has a ass i. Fro r i Newton s second law: r r F = a i i i where F r i is the total force acting on an ato. This force is coposed of a su of forces due to each of the other interacting atos in the syste. If the force on i th body due to j th r N r body is denoted as F ij, then F i = F ij j= ji, where F ii is zero, since there no force on i th ato due to itself. where r r r r d i d( vi ) d p Fi = = = d t dt dt p r i is the oentu of i th body i, (3.) 0

33 The force on the ato i is the gradient of the potential energy function with respect to the coordinates of ato i. r r r r F = V,,... ), (3.3) i i ( N where V is the potential energy function. r i is the position vector of ato i, x i, y i, and z i, are the Cartesian coordinates of ato i. r = x iˆ + y ˆj + z kˆ, i i i i and iˆ ˆ i = + j + kˆ. xi yi zi Now, force in the X- direction on ato i is given by: (3.4) d px d ( v Fx = = dt dt The velocity of ato i in X- direction is: x ) = V x. (3.5) d x px vx = =. d t Siilar equations can be derived in Y- and Z- directions. (3.6) It should be noted that to update the position of an ato, one has to solve 6 coupled first order differential equations. If the nuber of atos in the syste is N, then the nuber of differential equations to be solved is 6N. If the potential function is a siple pairwise potential, such as Lennard- Jones or Morse potential, then the total nuber of ters in such a siple potential is N(N-)/. Thus, as an exaple, a syste of 000 atos requires integration of,000 coupled first order differential equations of otion and about illion pairwise ters need to be calculated each tie a derivative is evaluated. Such evaluations ust be done for every integration step for every trajectory calculation. The coputational tie therefore, increases rapidly as the nuber of atos in the syste increase.

34 3.4 Tie integration algoriths The MD trajectories are defined by both position and velocity vectors and they describe the tie evolution of the syste. Accordingly the positions and velocities are propagated with a finite tie interval by solving a set of coupled first order differential equations. Tie integration algoriths are based on finite difference ethods, where tie is discretized on a finite grid. These are based on Taylor series expansion truncated n after soe finite nuber of ters. Truncation error varies as ( t), where t is the tie step and n is the order of the ethod (which depends on the nuber of tie derivatives considered in the expansion). Hence, for higher order ethods, the truncation error drops off rapidly even with a sall reduction in the tie step. The idea of the nuerical integration of Newton s equations of otion is to find an expression that defines positions at tie t + t in ters of already known positions and their tie derivatives, such as velocities and accelerations at tie t. Thus: r( t + t) = r( t) + v( t) t + a( t) t v( t + t) = v( t) + a( t) t + b( t) t a( t + t) = a( t) + b( t) t (3.7) 3.4. Verlet algorith This is perhaps the ost widely used ethod of integrating the equations of otion, initially developed by Verlet (967) and attributed to Stöer (Gear, 97). The Verlet algorith uses acceleration at tie t and positions at tie t and t- t to calculate the new positions at tie t+ t. The algorith is tie reversible, and given the conservative force field, it guarantees the conservation of the linear oentu.

35 3... ) ( ) ( ) ( ) (... ) ( ) ( ) ( ) ( + + = = + t t a t t v t r t t r t t a t t v t r t t r (3.8 A) (3.8 B) Adding Eqn. (3.8 A) and (3.8 B), and neglecting higher order ters, ) ( ) ( ) ( ) ( t t a t t r t r t t r + = + and the velocity is coputed as, t t t r t t r t v + = ) ( ) ( ) ( The basic Verlet algorith does not uses velocities explicitly which ay introduce nuerical iprecision, since the local error in updating the positions is proportional to t 4, whereas the error in updating the velocities is proportional to t Leap- frog Verlet algorith This is a odification of the Verlet algorith in which velocities are coputed at half tie step as well. The algorith is given as, t t a t t v t t v t t v t t r t t r + = = + ) ( ) ( ) ( ) ( ) ( ) ( (3.9) The advantage of this ethod is that the velocities are explicitly calculated, but the disadvantage is that they are not calculated at the sae tie as the positions. The velocities at tie t can be approxiated as, + + = ) ( ) ( ) ( t t v t t v t v (3.0)

36 Velocity Verlet algorith The velocity Verlet algorith provides a better description of the velocities, without the need to copute the velocities at half the tie step. [ ] t t t a t a t v t t v t t a t t v t r t t r = = + ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( (3.) Beean algorith Beean algorith (976) provides ore accurate expression for velocities t t t a t t a t t t a t v t t v t t t a t t a t t v t r t t r = = + ) ( 6 ) ( 6 5 ) ( 3 ) ( ) ( ) ( 6 ) ( 3 ) ( ) ( ) ( (3.) It should be noted that these algoriths assue that the force reains constant while updating the positions fro t to t+ t. To circuvent this assuption, higher order ters involving higher tie derivatives should be included in the algorith. It ay be noted that the above entioned ethods are self starting ethods i.e. it is only necessary to know the positions and velocities at tie t=t 0. An estiate of accuracy during the integration can be obtained by onitoring the total energy of the syste and the angular oentu which should be conserved. Step size reduction and back integration are other resources to check the accuracy of the integration results. Predictor-corrector ethods provide an autoatic error estiate at each integration step, allowing the progra to eploy a variable step size to achieve a specified accuracy. However, these ethods are not self starting. Therefore it is necessary to use velocity Verlet or siilar single step ethods to start the integration.

37 3.5 Methods to iprove coputational speed 3.5. Neighbor list and book-keeping techniques In the MD/MC siulations distances of ato i to all other atos in the syste are coputed. For the calculation of potential and forces only the atos that are separated by distances saller than the potential cutoff are considered. The tie to calculate all pair bond distances is proportional to N, where N is the nuber of atos in the syste. Verlet (967) suggested a technique for iproving the speed by aintaining a list of neighbors of a particular ato, which is updated at intervals. Between the updates of the neighbor list, only the bond distances of atos appearing in the list are coputed. This procedure reduces the coputational tie to soe extent Cell structures and link lists As the size of the syste increases the conventional neighbor list becoes too large to store easily and testing of every pair in the syste is inefficient. An alternative ethod of keeping track of neighbors for large systes is the cell index ethod. (Quentrec and Brot, 975; Hockney and Eastwood, 98). The cubic siulation box is divided into a regular lattice of M x M x M cells. A - diensional representation is shown in Fig 3..These cells are chosen such that the side of the cell l=l/m is greater than the cutoff distance of the potential. 5

38 Fig 3. Cell index ethod The first part of the ethod involves sorting all atos into appropriate cells. For this - diensional exaple, neighbors of any ato in the cell 3 can be found in cells 3, 7, 8, 9,, 4, 7, 8, 9. Now searching for the neighbors is fast and efficient. For a - diensional syste there are approxiately N c = N/M atos in each cell and for a 3- diensional syste there would be N c = N/M 3 atos in each cell. Using the cell structure ethod only 9NN c pairs have to coputed for a -diensional syste and in 3- diensional syste 7NN c pairs as opposed to ½ N(N-) in the conventional way. The cell structure can be set up and ipleented in a linked list. The linked list array stores the nuber of the next ato in the cell, i.e. the list array eleent for an ato is the next of next ato in the cell and so on. 3.6 Periodic boundary conditions (PBC) No atter how large the siulated syste is its nuber of atos would be negligible copared to the nuber of atos that ake up the acroscopic specien (of the order of 0 3 ). As a result, in the siulation odel the ratio of the nuber of surface atos to the total nuber of atos would be uch larger than in reality, causing surface effects to be uch ore iportant than what they would be. A solution to this proble is the use of periodic boundary conditions. Periodic boundary is a atheatical forulation 6

39 to ake a siulation that contains only a few hundred atos to behave as though it is infinite in size. In doing so it also reoves the surface effects since atos near the boundary have less neighbor than the atos inside. While using PBCs, the atos are considered to be enclosed in a box, and this box is replicated by translation in all three Cartesian directions to for an infinite lattice. In effect every ato in the siulation box has an exact duplicate in each of the surrounding cells with the sae velocities. Fig 3. Representation of periodic boundary condition (PBC) and the siulation cell Fig 3. shows the representation of the periodic boundary along with the ain siulation cell. r cut is the cutoff used in the potential. If an ato is located at a position r in the box then this particle represents an infinite set of particle located at: r+ la+ b + nc, (3.3) where l, and n are integer nubers, and a, b and c indicates the vectors corresponding to the edges of the box. All these iage particles ove together, but only one of the is represented in the coputer progra. Each particle should be thought of as interacting not only with other particles in the box but also with their iages in the surrounding boxes. Whenever an ato leaves the siulation cell, it is replaced by another with the 7

40 sae velocity entering fro the opposite cell face, so the nuber of atos in the siulation cell is conserved. Therefore no ato feels any surface effects. An artifact of this technique is that if the siulation cell is too sall and the cutoff radius is greater than half the length of the siulation cell then an ato i can interact with ato j as well as its iage j in the neighboring cell. As a result the interaction between atos i and j is effectively counted twice as: i-j and i-j. The iniu iage criterion states that if the box size is greater than r c, where r c is the cutoff radius of the potential then aong all pairs fored by an ato i in the box and the set of all periodic iages of ato j atost one will interact. To deonstrate this, if suppose an ato i interacts with two iages j and j of an ato j. These two iages ust be separated by a distance of at least r c. In order to interact with both j and j, i should be within a distance r c fro each of the, which is ipossible since they are separated by ore than r c. Using the iniu iage criterion, only the closest iage of an ato j if within the cutoff radius r c will interact with an ato i. Hence, it is very iportant that the box size of the siulation cell is at least greater than twice the cutoff radius specified for the potential. Long range interactions such as Colubic interactions larger than the siulation cell are treated using Ewald suation which was developed for studying ionic crystals (Ewald 9; Madelung 98). Using the iniu iage criterion, the periodic boundaries are odeled by odifying the distance calculation. The difference between the x- coordinate of an ato i and j is denoted as dx, then if dx is greater than half the box size in x- direction (L/) then ato j does not interact with ato i. Instead, its iage j in a surrounding cell is considered to be bonded with ato i. 8

41 Alternatively, If, dx > L/ then dx = dx L, If, dx < -L/ then dx = dx + L. (3.4) dx (3.5) dx = dx L anint, L where L is the box size in that particular Cartesian direction and anint is a function that rounds off the ratio L dx to the nearest integer value. Siilar procedure is applied in the Y- and Z-directions, and the distance is coputed as: dx + + dy dz. (3.6) 9

42 5 CHAPTER 4 4 Interatoic Potentials The interatoic potential used in a siulation to odel the lattice structure plays an iportant role in deterining the accuracy of the siulation results. The accuracy of the trajectories obtained fro MD siulation is affected by the choice of the potential energy function. The total energy of the syste is the su of kinetic energy (KE) and the potential energy (PE). The KE is siple to copute but the PE coputation is coplex since it depends on the positions of all interacting atos. The coplexity of the potential also deterines the coputational tie required for the siulation. The force acting on an ato is proportional to the first derivative of the potential energy function. The largest part of a olecular dynaics siulation is the evaluation of the forces which are required to calculate new positions of atos. The potential energy can be obtained using two approaches. One approach is to perfor ab initio calculations by solving the Schrodinger s equation for the entire syste at each integration step. Although, theoritically this is the best approach (without any arbitrary assuptions and ad hoc paraeterization), it is not feasible to perfor such calculations for the entire syste at every tie step. For larger systes coparitively siple and coputationally efficient epirical potential functions are used which take into account factors such as, bond stretching, bending and torsions, covalent bonding and non-bonded (Van der Waals and Coulob) interactions. 30

43 4. Design of inter-atoic potentials The traditional approach is to copletely get rid of electronic degrees of freedo and ove the nuclei using soe appropriate analytical potential function. The functional for of this potential is chosen so as to iic the behavior of the true potential in realistic ways for specific aterials under specific conditions. Constructing the potential involves two steps:. Selecting an analytical for for the potential. This could be sus of pairwise ters where the energy of the pair depends on the relative distance, or a anybody for appropriate for specific type of bonding involving bond distances, angles, and coordination.. Once the for is decided then the next step is to deterine specific paraeterization of the functions. A set of paraeters is found which fits the data fro ab initio calculations or experiental properties, such as, density, cohesive energy and phonon frequencies, or a data set containing soe cobination of both theoretical and experiental observations. The ost iportant factor is the range of applicability of the potential energy function. Since ad hoc functional for is used, it has no theoretical foundation and once the epirical potential is developed there is no straightforward eans to iprove it. Norally, the paraeters are adjusted to the equilibriu data which eans that the potential can be used to odel the bulk properties of the crystal lattice close to equilibriu conditions. So it is unlikely that the energies and forces for aorphous structures will be accurately represented by the potential. For exaple, a potential function designed for bulk ay not be appropriate for studying the bond dissociation or 3

44 diatoic olecule of the sae eleent, since the environent would be very different fro the one for which it was designed. However, it ay be possible to odel the bulk and surface where the environent differs due to the reduced coordination of the atos at the surface. These probles are intractable and one should be critical while using analytical potentials for siulating high teperature and pressure conditions. 4. Classification of potential energy functions The epirical potentials are further classified into two-body, three-body and any-body potential fors. Most of the epirical potential fors coprise an attractive and a repulsive ter. The ter epirical ay be isleading as it ay not be strictly epirical as the ter suggests (Koanduri and Raff, 999). These potentials can provide a ore realistic odel than the potentials derived fro purely theoretical considerations (Torrens, 97) since paraeterization of these potentials can take into account soe physical properties as well. Epirical potential fors are based on atheatical expressions for pairwise interaction between two atos or ions, which ay or ay not be justified fro theory. These fors can be expressed in ters of attractive and repulsive ters. By convention repulsive forces are considered positive and attractive forces as negative. As the distance between two atos decreases, both attractive and repulsive forces increase. The repulsive forces increase ore rapidly than the attractive forces. The curvature of the potential energy function is ainly deterined by the repulsive force coponent, which in turn also governs the elastic behavior of the solid. When attractive and repulsive forces exactly balance each other, this corresponds to the equilibriu position with the potential energy being iniu, which is also the bond energy. The cohesive properties of solids, 3

45 such as elting and vaporization behavior are deterined by the agnitude of the axiu binding energy, which is governed by attractive force coponent. Higher the bonding energy, the higher is the elting teperature and the elastic odulus and lower is the coefficient of theral expansion. Several epirical potential functions have been developed suitable for different aterials. Soe exaples of siple pair potentials are Lennard- Jones (also known as 6-) (95), Morse potential (99) and Born-Meyer potential (933). Also for coplex systes such as one involving covalent bonding coon potential functions are Stillinger-Weber (985), Tersoff (988, 989), Bolding-Anderson potential (990), Brenner potential (990), and ebedded ato potential and odified eebedded ato potential (MEAM) (Baskes, 989). Cutoff radius In general, each ato can interact with all other atos in the siulation. One ethod of increasing the coputational speed is to liit the range of the potential. Norally the potential functions are truncated at a certain value of bond distance called the cutoff radius. By neglecting the interaction beyond the cutoff radius, the coputational tie is significantly reduced with very little loss in accuracy. Truncation of the potential energy function also results in the truncation of the forces. The cutoff distance is generally taken as the distance where the potential energy is ~3-5% of the equilibriu potentials energy. Consequently, a finite cutoff results in sall fluctuations in total energy instead of strictly constant energy. 33

46 4.. Pair potentials Pair potentials are applicable for any etals for atoistic studies. Pair potentials can be classified into two basic categories (Vitek, 996), one in which the potential deternies the total energy of the syste and the second type deterines the change in energy when a configuration varies under constant density conditions. Lennard- Jones potential (Lennard- Jones, 95) is given as: where 6 # $ $ (4.) V LJ ( r) = 4!,!" r r and$ are paraeters that deterine the well depth and the position of the potential iniu, respectively. One of the ost coonly used fors of the pair potential is the Morse potential (Morse, 99). It is used to odel the interaction between atos of two different aterials. It is given as: ( r r ) ( ) ( ) e V r = D e, ( r r r re or V r = D ( e e ) e ) ( ) ( )... r % rc. (4.) where r is the bond distance between the two atos, and r c is the cutoff radius beyond which the potential is truncated to zero. Potential values coputed using the two fors given in Eqn. (4.) differes by a factor of D, which is a constant, and as a result the forces coputed using the two fors are the sae and either of the fors could be used in siulation. The potential paraeters D,, r e are adjusted to the easured subliation enthalpy, Debye teperature, and the nearest neighbor spacing for the aterial. Garifalco and Weizer (959) calculated Morse potential paraeters using experiental values for energy of vaporization, lattice constants and copressibility. Morse potential can odel FCC etals well but not BCC etals and covalent aterials. A general feature of a 34

47 physically reasonable pair potential is that they favor foration of closed packed structures. So they are unsuitable for describing covalent systes which assue ore open structure. 4.. Many-body potentials Pair potentials, such as Lennard-Jones potential are suitable to odel interactions of inert gases, such as Heliu, Argon, and Xenon. A covalent aterial represents a difficult challenge because of coplex quantu echanical effects, such as cheical bond foration, hybridization, etallization, charge transfer, and bond bending, which ust be described by an effective interaction between atos. For instance, silicon undergoes a series of structural phase transitions fro tetrahedral to a denser phase called as -tin to siple cubic to FCC under pressure. This indicates that the energy difference between these structures is not too large which suggests that the cohesive energy is nearly independent of the coordination. The pairwise potential odel would favor the ore packed structure. Consequently, no reasonable pair potential can stabilize the diaond tetrahedral structure without considering the angular ters. Various ethods have been introduced to circuvent this proble. Several authors have introduced potentials with a strong angular dependence (Stillinger and Weber, 985; Biswas and Haan, 985; Baskes, 987; Baskes et al, 989). Pettifor (989) developed siilar approach fro tight binding. Models based on the bond charge (Brenner and Garrison, 985), and a function of local coordination (Tersoff, 986) have also been developed. Most of these odels have been successful in the regie for which they were paraeterised but have shown a lack of transferability. A survey of six such potentials (Balaane and Halicioglu, 99) concluded that each has strengths and liitations, none appearing clearly superior to 35

48 others, and none being fully transferable. Recently the environent dependent interatoic potential (EDIP) (Justo et al, 997, 998; Bazant et al, 997) have been developed as an iproveent over previous odel. The other group of potentials (Kane, 985; Keating, 966) are constructed to accurately describe sall distortions fro the ground state in coplex systes, such as diaond structure of seiconductors. The Keating odel uses Taylor series expansion of the energy about its iniu. It can give accurate descriptions of sall displaceents but becoe progressively less accurate for large displaceents. Such potentials are useful for describing phonons and elastic deforations. In general, any any-body potential energy function describing interactions aong N identical particles can generally be resolved into one-body, two-body, threebody etc. contributions as follows: E = V ( i) + V ( i, j) + V ( i, j, k) +... (,..., 3 + VN N i, i, j, k i i < j j i< j< k ). (4.3) The single particle potential V describes external forces. The first ter which describes the interactions of atos is the second ter, which has a for of pair potential. In order that this description to be useful in theoretical odeling it is necessary that the coponent functions V n quickly converges to zero with increasing n. The first step towards a any-body for is to include three-body ters, by favoring the bond angles corresponding to those of diaond structure. Stillinger and Weber (985) proposed such a potential which has the for: (4.4) V = ( rij ) + g( rij ) g( rik ) cos( ijk ) +, ij ijk 3 where ijk is the bond angle fored by the bonds ij and ik and g(r) is a decaying function with a cutoff between the first and second neighbor. The bond angles in diaond 36

49 tetrahedral structure are ~ 09.5 and the cosine of this angle is ~ -/3. This akes the ter cos( ijk )+/3 to go to zero for ~ 09.5, which akes tetrahedral structure ore stable than copact structure. This potential gives a fairly realistic description of crystalline silicon. However its built-in tetrahedral bias create probles in other situations for exaple, it cannot predict the right energies of the non-tetrahedral structures found under pressure, the coordination of the liquid is too low and the surface structures are not correct. Consequently this potential is not transferable to situations other than it was designed for. Work by Carlsson (990) showed that to build realistic potential, analytic fors which take into account local environents with bond strengths depending on it should be considered. The work of Biswas and Haan (987) suggests that a three-body potential is not adequate for accurately describing the cohesive energy of silicon over a wide range of bonding geoetry and coordination. However a general for for a fourbody or five-body potential becoes intractable as it would contain too any paraeters. Instead of a general N-body for, Tersoff (Tersoff, 988; 989) proposed an approach by coupling two body and ulti body correlations into the odel. The potential is based on bond order which takes into account local environent. The bond strength depends on geoetry, the ore neighbors an ato has, the weaker the bond to each neighbor would be. If the energy per bond decreases rapidly with increasing coordination then the diatoic olecules would be the ost stable arrangeents of atos. On the other hand, if the bond order depends weakly on coordination then this should favor closed packed structures to axiize the nuber of bonds. Silicon undergoes structural transitions with a varying coordination under pressure with a sall difference in cohesive energy (Yin 37

50 and Cohen, 980, 98, 984; Chang and Cohen, 984). This suggests that the decrease in bond strength with increase in coordination nuber alost cancels the increase in nuber of bonds over a range of coordination (Tersoff, 988). Tersoff potential (Tersoff, 989) is given as: (4.5 a) E = Ei = Vij, i i j Vij = f C ( rij )[ f R ( rij ) + bij f A ( rij )]. where E is the total potential energy of the syste. f R and f A are repulsive and attractive pair potentials, respectively, and f C is a cutoff function given by: f f R A ( r ) = A ij e ( r ) = B ij ij (, ij rij ) (4.5 b) ij e, ( µ ij rij ), ), rij < R & & # ( rij R) fc ( rij ) = ( + cos!, R < rij < S & " ( S R) & ' 0, rij > S where r ij is the bond distance between atos i and j. S is the cutoff radius. ij, µ ij, and R are potential paraeters. The ain feature of this for is the ter b ij. The idea is that the strength of each bond depends upon the local environent and is lowered when the nuber of neighbors is relatively high. This dependence is expressed by the paraeter b ij which can diinish the attractive force relative to the repulsive force. b ij ij ni = / ( +. = ij C k i, j ( r ik ci g( ijk ) = + d f i i ni ij n (4.5 c) ), )- g( ), ik i [ ( cos( )) ], d + h i ijk i c The ter ij defines the effective coordination nuber of ato taking into account the relative distance between two neighbors (r ij -r ik ) and the bond angle ijk. The function g() ijk 38

51 has a iniu at h=cos(), the paraeter d deterines how sharp the dependence on the angle is and c expresses the strength of the angular effect. The paraeters R and S are not optiized but chosen so as to include first neighbors only for several selected high syetry structures, such as: graphite, diaond, siple cubic, and face centered cubic. The paraeter x ij strengthens or weakens heteropolar bonds in ulti-coponent systes. x ii =, and x ij = x ji. Also ii =. The potential was first calibrated for silicon (Tersoff, 988) a and then for carbon (Tersoff, 988) b. The paraeters were chosen to fit theoretical and experiental data, such as cohesive energy, lattice constants, and bilk odulus obtained fro realistic and hypothetical configurations. Table 4. lists the paraeters for carbon, silicon and geraniu. Table 4. (Paraeters of Tersoff potential, 989) C Si Ge A (ev).3936 x x x 0 3 B (ev) x x x 0 ( A - ) µ ( A - ) x 0-7. x x 0-7 n 7.75 x x x 0 - c x x x 0 5 d x x 0 h x x x 0 - R ( A) S ( A) Brenner (990) developed an epirical any-body potential for hydrocarbons which can odel intraolecular cheical bonding in a variety of sall hydrocarbon olecules as well as graphite and diaond lattice. The potential function was based on Tersoff covalent bonding forulation, with additional ters to correct an inherent overbinding of radicals. Nonlocal effects were incorporated via an analytic function that defines conjugation based on the coordination of carbon atos. The potential was fitted 39

52 to the binding energy of the C diatoic olecule, and binding energies and lattice constants of graphite, diaond, siple cubic and face centered cubic structures. Dyson and Sith (996) extended the Brenner potential for C-Si-H syste to odel the cheical vapor deposition (CVD) diaond growth on the silicon substrate. Paraeters for the Si-Si and Si-C interactions were fitted to diatoic bond energy and bulk properties such as bulk odulus, experiental cohesive energy, experiental lattice constant instead of bond length, phonon ode frequencies. 40

53 6 7 CHAPTER 5 5 Ab Initio Calculations 'Ab initio' is a ter used to describe a solution of the non-relativistic, tie independent Schrödinger equation: = E. Where is the Hailtonian operator for the syste, which is a function of kinetic and potential energies of the particles of the syste, is the wavefunction, and E is the energy. The Hailtonian for an N electron ato is h H = n n h N e i= i N i= Z r e i + N N e r i= j= i+ ij (5.). The first ter on the right hand side of the Eqn.(5.) represents the kinetic energy of the nucleus. It contains the coordinates of the nucleus and derivatives with respect to the coordinates of the nucleus, but it does not contain the coordinates of any of the N electrons or derivatives with respect to these coordinates. Therefore, this is called as zeroelectron ter. The second and third ters in Eqn.(5.) contains the coordinates of one electron and hence are called as one electron ters, where as the last ter, electronelectron repulsion, depends on coordinates of both electrons and hence is called as two electron ters. 4

54 5. Born-Oppenheier approxiation It is coputationally ipossible to solve Eqn.(5.) for anything other than the hydrogen ato. For this reason, for a any bodied syste, approxiations are introduced to siplify the calculations. One of the approxiations is the Born- Oppenheier approxiation. It can be noted that the ass ter appears in the denoinator of the nuclear kinetic energy ter is uch larger than the ass of the electron. This large ass eans that the nuclei are oving very slowly relative to the electrons. The nuclei can be considered to be oving in the potential generated by the oving electrons. This approxiation allows the electronic Hailtonian to be considered for a fixed set of nuclear co-ordinates. 5. Basis sets The next approxiation involves expressing the olecular orbitals as linear cobinations of a pre-defined set of one-electron functions known as basis functions. A basis set is the atheatical description of the orbitals within a syste, which in turn cobine to approxiate the total electronic wavefunction. These basis functions are usually centered on the atoic nuclei and so bear soe reseblance to atoic orbitals. An individual olecular orbital is defined as: The coefficients c µ i i = N µ = c µ i / µ. (5.) are known as the olecular orbital expansion coefficients. The basis functions N are also chosen to be noralized. 4

55 5.. Slater-type orbitals (STO) STOs are constructed fro a radial part describing the radial extend of the orbital and an angular part describing the shape of the orbital. n ( r) µ = C r e Yl..(5.3) The radial part r e n ( r) depends on the distance r fro the origin of the basis function (usually the location of the nucleus), the orbital exponent, and the principal quantu nuber, n. The spherical part, Y l known as spherical haronics, depends on the angular quantu nuber, l and the agnetic quantu nuber,. The noralization constant, C is chosen such that the integral over the square of the basis function yields unity. 5.. Gaussian-type orbitals (GTO) Gaussian-type orbitals are constructed fro a radical and a spherical part, but the radical part has a different dependence on r. The GTO squares r so that the product of gaussian priitives is another gaussian. By doing this, the equation is uch easier at the cost of accuracy. To copensate for this, ore gaussian equations are cobined to get ore accurate results. Gaussian and other ab initio electronic structure progras use gaussian type atoic functions as basis functions. Gaussian functions have a general for as: g a b c r = c x y z e, (5.4) where is the constant deterining the size (radical extent) of the function. In a Gaussian function e r is ultiplied by powers of x, y, z, and noralization constant so that: 0 g =. (5.5) all space 43

56 The su of these exponents L= a+ b+ c is used to define the angular oentu of the basis function: s- type (L=0), p- type (L=) d- type (L=), f- type (L=3), g- type (L=4) Following are soe representative Gaussian functions for s, p y and d xy respectively: g s (, r) = g g y xy (, r) = r 4 r (, r) = xy e. e, y e r, (5.6) Linear cobinations of priitive Gaussians are used to for the actual basis functions called contracted Gaussians which have the for: where g p are priitive Gaussians, µ = p /, (5.7) d µ p d g µ p p are fixed constants within a given basis set. This results in the following expansion for olecular orbitals: i = µ c µ i / µ = cµ i d µ p g p. µ p (5.8) 5..3 Types of basis sets Larger basis sets ore accurately approxiate the orbitals by iposing fewer restrictions on the locations of electrons in space. In the true quantu echanical picture, electrons can have a finite probability of existing anywhere in space; this liit corresponds to the infinite basis set expansion. Standard basis sets for electronic calculations use linear cobinations of Gaussian functions to for orbitals. Basis sets ay be classified by the nuber and type of basis functions that they contain. Basis sets 44

57 assign a group of basis functions to each ato within a olecule to approxiate its orbitals. These basis functions are coposed of a linear cobination of Gaussian functions referred to as contracted functions, and the coponent Gaussian functions are referred to as priitives. A basis function consisting of a single Gaussian function is tered as uncontracted. Minial basis set Single Gaussian functions as described by Eqn.(5.4) are not well suited to describe the spatial extent and nodal characteristics of atoic orbitals. Hence, basis functions are described as su (contraction) of several Gaussians as used in Eqn.(5.7). Minial basis set contains iniu nuber of basis functions needed for each ato. Minial basis set use fixed size atoic type orbitals. The STO-3G basis set (Hehre et al., 969; Collins et al., 976) is a inial basis set (although it is not the sallest possible basis set). It uses three Gaussian priitives per basis function, which accounts for 3G in its nae. STO stands for Slater-type orbitals. The STO-3G basis set approxiates Slater-type orbitals with Gaussian functions. Double zeta, triple zeta and quadruple zeta basis set The inial basis set approxiates all orbitals to be of the sae shape. The double zeta basis sets such as the Dunning-Huzinaga basis set denoted as D95 (Dunning and Huzinaga, 976) fro all olecular orbitals fro linear cobinations of two sizes of functions for each atoic orbital. Dunning s basis set has been derived fro an already existing large atoic basis set of nine uncontracted Gaussian priitives of the s-type and five uncontracted Gaussian priitives of the p-type. Six of the nine s-type functions have then been grouped into a single contraction, while the other three s-type functions have 45

58 been left alone. Siilarly, four of the five p-type functions have been contracted into a single function, while one function was left uncontracted. This yields a basis set of four s- type and two p-type basis functions. In contrast to the split valence basis sets discussed before, the D95 basis set is a full double zeta basis set in that it allocates two basis functions for each atoic orbital of the core as well as the valence region occupied in the electronic ground state. The double zeta basis set is iportant because it allows to treat each orbital separately during Hartree-Fock calculations. This gives ore accurate representation of each orbital. Each atoic orbital is expressed as a su of two Slater -type orbitals. The two equations are the sae except for the value of zeta, which accounts for how diffuse (large) the orbital is. The two STOs are added in soe proportion as in Eqn.( 5.9 ). STO s = s STO ( r, ) + d s ( r, ) ( 5.9 ) Slater- orbital Slater- orbital Constant The constant d deterines how uch each STO will account towards the final orbital. Thus, the size of the atoic orbital can range anywhere between the value of either of the two STOs. The triple and quadruple zeta basis sets use three and four Slater equations respectively. Split valence basis set The description of the valence electrons can be iproved over that in the inial STO- 3G basis set if ore than one basis function is used per valence electron. Basis sets of this type are called split valence basis set. They have two sizes of basis function for each valence orbital. Often, it takes too uch effort to calculate a double-zeta for every 46

59 orbital. Since the inner-shell electrons are not as vital to the calculation, they are described with a single Slater Orbital and the only valence orbitals are treated using a double-zeta. Exaples of split valence basis sets are 3-G (Binkley et al., 980; Gordon et al., 98; Pietro et al., 98; Dobbs and Hehre, 986; 987) and 6-3G (Ditchfield et al., 97; Hehre et al., 97; Hariharan and Pople, 973; 974; Gordon, 980; Binning and Curtiss, 990). The notation 3- G eans suing 3 Gaussians for the inner shell orbital, each valence electron is described by two basis functions, two Gaussians for the first STO of the valence orbital and Gaussian for the second STO. The nuber of Gaussian functions sued to describe the inner shell orbital The nuber of Gaussian functions sued in the second STO 3-G The nuber of Gaussian functions that coprise the first STO of the double zeta Fig 5. Description of split valence basis set The ain difference between 3-G and 6-3G is that a uch larger nuber of priitives are used in the latter in the core as well as the inner ost valence shell. The use of a contraction of six Gaussian priitives for each core orbital iproves the description of the core region significantly. The valence region is described by two basis 47

60 functions per atoic orbital. The inner shell is coposed of a contraction of three Gaussians and the outer shell consists of one single Gaussian priitive. 6-3G also known as 6-3G(d) and 6-3G also known as 6-3G(d,p) is defined as part of the coplete basis set ethods (Petersson and Al-Laha, 99 ; Petersson et al. 988). Siilarly, triple split valence basis sets such as 6-3G also known as MC-3G (McLean and Chandler, 980; Krishnan et al., 980; Wachters, 970; Hay, 977; Raghavachari and Trucks, 989; Binning and Curtiss, 990; Curtiss et al., 995; McGrath and Rado, 99) use three sizes of contracted functions for each orbital type. Polarized basis set A better approxiation is to account for the fact that soeties orbitals share qualities of 's' and 'p' orbitals or 'p' and 'd', etc. and not necessarily have characteristics of only one or the other. As atos are brought close together, their charge distribution causes a polarization effect (the positive charge is drawn to one side while the negative charge is drawn to the other) which distorts the shape of the atoic orbitals. Split valence basis sets allow orbitals to change size, but not to change shape. Polarized basis sets reove this liitation by adding orbitals with angular oentu beyond what is required for the ground state to the description of each ato. The first step consists of the addition of a set of d-type functions to the basis sets of those atos, which have occupied s- and p-shells in their electronic ground states. For hydrogen, this corresponds to the addition of a set of p-type functions. Two different notations exist to specify the addition of polarization functions. The first notation adds one asterisk to the basis set to specify addition of polarization functions to non-hydrogen atos, while two asterisks sybolize 48

61 the addition of polarization functions to all atos (including hydrogen). The 6-3G** basis set (Frisch et al., 984) is thus constructed fro the split valence 6-3G basis set through addition of one set of d-functions to all non-hydrogen atos and one set of p- functions to all hydrogen atos. In the second notation the polarization functions are specified through their angular quantu nuber explicitly. The 6-3G** basis set would then be tered 6-3G (d,p). This latter notation is uch ore flexible as ultiple sets of polarization functions can be specified uch ore easily. The notation 6-3G (d) eans the polarization functions for the 6-3G basis appear in the basis set listing as a single set of uncontracted d-type Gaussians. There are six Cartesian d-type Gaussians (x, y, z, xy, yz, zx) e (- r ), which are equivalent to five pure d-type functions (xy, yz, zx, x -y, 3z -r ) e (- r ) plus one additional s-type function. Diffuse functions In cheistry, one is ainly concerned with the valence electrons which interact with other olecules. However, any of the basis sets concentrate on the ain energy located in the inner shell electrons. This is the ain area under the wave function curve. In Fig 5., this area is that to the left of the dotted line. Norally the tail (the area to the right of the dotted line), is not really a factor in calculations. Fig 5. Variation of energy with respect to radius 49

62 However, when an ato is in an anion or in an excited state, the loosely bond electrons, which are responsible for the energy in the tail of the wave function becoe uch ore iportant. The theoretical description of negatively charged species is particularly challenging for ab initio olecular orbital theory. This is due to the fact that the excess negative charge spreads outward to a uch larger degree than is typically the case for uncharged or positively charged olecules. To copensate for this, diffuse functions are used. Diffuse functions are large size versions of s- and p-type functions (as opposed to standard valence size functions). They use sall orbital exponents. They allow orbitals to occupy a larger region of space and are iportant for systes where electrons are relatively far fro the nucleus, such as olecules with lone pairs, anions and other systes with significant negative charge, systes in their excited states, and systes with low ionization potentials. Diffuse basis functions are typically added as an additional set of uncontracted Gaussian functions of the sae angular oentu as the valence electrons. To reflect the addition of diffuse basis functions on all non-hydrogen atos, a + sign is added to the standard basis set notation. If diffuse s-type functions are also added to the basis set of hydrogen atos, a second + sign is appended. The 6-3+G(d) basis set (Clark et al., 983) is the 6-3G(d) basis set with diffuse functions added to heavy atos. The basis et 6-3++G(d) adds diffuse functions to the hydrogen as well. Diffuse functions on hydrogen seldo ake a significant difference in accuracy. High angular oentu basis set Such basis sets add ultiple polarization functions per ato to the triple zeta basis set. For exaple, the 6-3G(d) basis set adds two d functions per heavy ato instead of just one. Such basis sets are useful for describing the interactions between 50

63 electrons in electron correlation ethods; they are not generally needed for Hartree-Fock calculations, to be discussed in the next section. Basis sets for post- third row atos basis sets for atos beyond the third row of the periodic table are handled soewhat differently. For very large nuclei, electrons near the nucleus are treated in an approxiate way, using effective core potentials (ECP). This treatent includes relativistic effects. The LANLDZ basis set is an exaple of such a basis set. 5.3 Hartee-Fock ethod Molecular orbital theory decoposes the wave function into a cobination of olecular orbitals, i are chosen to be noralized orthogonal set of olecular orbitals. Thus * 000 i dx dy dz = * 000 i j dx dy dz = 0; i i, (5.0) j. (5.) The siplest way of expressing the wave function of a any electron syste is to take the product of the spin-orbitals of the individual electrons. For the case of two electrons x x ) = / ( x ) / ( ). (5.) ( x This is known as Hartree product. However, such a function is not antisyetric, since swapping the orbitals of two electrons does not result in a sign change. i.e. x x ) ( x ). (5.3) ( x Therefore the Hartree product does not satisfy the Pauli principle. This proble can be overcoe by taking a linear cobination of both Hartree products: 5

64 x x ) = / {/ ( x ) / ( x ) / ( x ) ( )}, ( x ( 5.4) where the coefficient is the noralization factor. This wave function is antisyetric. Moreover, it also goes to zero if any two wave functions or two electrons are the sae. The expression can be generalized by writing it as a deterinant called Slater deterinant: ( x, x,..., x N / ( x ) / ( x / ( x ) ) / ( x / ( x / ( x ) ) )... / ( x )... / ( x ) = N! N N N N... / ( x N N ) ) ( 5.5) A single Slater deterinant is used as an approxiation to the electronic wave function in Hartree-Fock theory. In ore accurate theories, such as configuration interaction and ulticonfiguration self consistent field (MCSCF) calculation (Hegarty and Robb, 979; Eade and Robb, 98; Schlegel and Robb, 98; Bernardi, et al., 984; Frisch et al., 99), a linear cobination of Slater deterinants is used. Each row is fored by representing all possible assignents of electron i to all orbital spin cobinations. Swapping two electrons corresponds to interchanging two rows of the deterinant, which will have the effect of changing its sign. The original Hartree ethod expresses the total wave function of the syste as a product of one-electron orbital. In the Hartree-Fock ethod (Hehre et al., 986), the wave function is an antisyetrized deterinantal product of one-electron orbital (the "Slater" deterinant). Schrödinger s equation is transfored into a set of Hartree-Fock equations. Now the proble is to solve for a set of olecular orbital expansion 5

65 coefficients c µ i in Eqn.(5.8). Hartree-Fock theory takes advantage of the variational principle, which states that for the ground state of any antisyetric noralized function of the electronic coordinates denoted as, the expectation value corresponding to will always be greater than the energy for the exact wave function: E() > E(). (5.6) In other words, the energy of the exact wave function serves as a lower bound to the energies calculated by any other noralized antisyetric function. Thus the proble becoes one of finding the set of coefficients that iniize the energy of the resultant wave function. The variables in the Hartree-Fock equations depend on theselves. So, they ust be solved in an iterative anner. Convergence ay be iproved by changing the for of the initial guess. Since the equations are solved self-consistently, Hartree-Fock is an exaple of a self-consistent field (SCF) ethod. Hartree-Fock calculation involves the following steps:. Begin with a set of approxiate orbital for all the electrons in the syste.. One electron is selected, and the potential in which it oves is calculated by freezing the distribution of all the other electrons and treating their averaged distribution as the Centro-syetric source of potential 3. The Schrödinger equation is solved for this potential, which gives a new orbital for it. 4. The procedure is repeated for all the other electrons in the syste, using the electrons in the frozen orbitals as the source of the potential. 5. At the end of one cycle, there are new orbitals fro the original set. 53

66 6. The process is repeated until there is little or no change in the orbitals. Unrestricted Hartree-Fock ethod is capable of treating unpaired electrons for open shell systes. In this case the and electrons are in different orbitals, resulting in two sets of olecular orbital expansion coefficients. 3 3 i. i = = µ µ c c µ i. µ i /, µ /. The two sets of coefficients result in two sets of Fock atrices producing two sets of orbitals. These separate orbitals proper dissociation to separate atos and correct delocalized orbitals for resonant systes. However, the eigenfunctions are not pure spin states, but contain soe aount of spin containation fro higher states, for exaple, doublets are containated to soe degree by functions corresponding to quartets and higher states.. It ight give an ipression that olecular orbitals are real. Except for the hydrogen ato that is not the case. Molecular orbitals are the product of Hartree-Fock theory, which is an approxiation to the Schrödinger equation. The approxiation is that each electron feels only the average Coulob repulsion of all the other electrons. This approxiation akes Hartree-Fock theory uch sipler than the real proble, which is an N-body proble. Unfortunately, in any cases this approxiation is rather serious and can give bad answers. It can be corrected by explicitly accounting for the electron correlation by any-body perturbation theory (MBPT), configuration interaction (CI), or density functional theory (DFT). µ 54

67 5.4 Electron correlation ethods Hartree-Fock theory provides an inadequate treatent of the correlation between the otions of electrons within a olecular syste, especially that arising between electrons of opposite spin. When Hartree-Fock theory fulfills the requireent that be invariant with respect to the exchange of any two electrons by antisyetrizing the wave function, it autoatically includes the ajor correlation effects arising fro pairs of electrons with the sae spin. This correlation is tered as exchange correlation. The otion of electrons of opposite spin reains uncorrelated under Hartree-Fock theory. Any ethod which goes beyond SCF in attepting to treat this phenoenon is known as electron correlation ethod or post- SCF ethod Requireents of electron correlation theories Any electron correlation theory should provide a unique total energy for each electronic state at a given geoetry and should also provide continuous potential energy surfaces as the geoetry changes. The resulting energy should be variational, i.e. it should be an upper bound to the exact energy. The ost iportant criterion for an accurate electron correlation theory is the property of size consistency. This ter refers to the linear scaling of energy with the nuber of electrons in the syste. A size consistent ethod leads to additive energies for infinitely separated systes. Size consistent ethod is necessary to reach quantitative accuracy. Another iportant criterion is the correctness for two electron systes. For any olecular syste coposed of reasonably well defined electron pair bonds, such ethods provide excellent starting points for inclusion of additional corrections such as those fro three electron correlations. Such three electron correlations although coputationally expensive, are crucial to reach quantitative 55

68 accuracy (Raghavachari and Anderson, 996). In general, correlation techniques are coputationally ore expensive than HF ethods. Many correlation techniques involve an iterative solution of a set of coupled equations Configuration interaction (CI) Aong the any schees introduced to overcoe the deficiencies of Hartree- Fock ethod, the ost siple and general technique to address the correlation proble is configuration interaction (Shavitt, 977; Schaefer, 977; Boys, 950). Configuration interaction is a straight forward application of the linear variational technique to the calculation of electronic wavefunctions. A linear cobination of configurations (Slater deterinants) is used to provide a better variational solution to the exact any-electron wavefunction. The ost iportant type of correlation effect which contributes to cheical bonding is usually tered as left-right correlation. For H, this refers to the tendency that when one electron is near the first hydrogen, the other electron tends to be near the second hydrogen. This is absent in the HF ethod where the spatial positions of the two electrons occupying the lowest bonding olecular orbital are uncorrelated. The proble gets worse as the two atos ove apart and dissociate. Qualitatively, this can be corrected by including a second configuration where both electrons occupy the antibonding orbital. While this is unfavorable energetically, a ixture of the HF configuration with this second configuration provides a better description of the syste. This is referred to as configuration interaction. The second configuration has only a sall weight at the equilibriu distance in H but its weight increases as the bond 56

69 distance increases until the configurations have equal weights at dissociation. Such a leftright correlation is included in valence-bond-type wave functions. The exact wave function can not be expressed as a single deterinant. CI ethod constructs other deterinants by replacing one or ore occupied orbitals within the Hartree-Fock deterinant with a virtual orbital. In a single substitution, a virtual orbital a replaces an occupied orbital i within the deterinant. This is equivalent to exciting an electron to a higher energy orbital. Siilarly in a double substitution, two occupied orbitals are replaced by virtual orbitals and triple substitution would exchange three orbitals and so on. CI wave function ixes the Hartree-Fock wave function with single, double, triple, quadruple excited configurations, and the coefficients which deterine the aount of ixing are deterined variationally. The full CI ethod fors the wave function as a linear cobination of Hartree-Fock deterinant and all possible substituted deterinants: = b +, (5.7) 0 0 b s s> 0 where the 0 -indexed ter is the Hartree-Fock level and s runs over all possible substitutions. The b s are the set of coefficients to be solved for by iniizing the energy of the resultant wave function. If all possible excited configurations are included, the ethod gives the exact solution within the space spanned by a given basis set. Full CI is the ost coplete non-relativistic treatent of the olecular syste possible, with the liitation iposed by the chosen basis set. s 57

70 5.4.3 Liited configuration interaction The full CI ethod is size consistent and variational. However, it is very coputationally expensive and ipractical for large systes, since the nuber of configurations in full CI expansion grows exponentially with the size of the syste. Practical CI ethods augent the Hartree-Fock by adding only a liited set of substitutions. CIS ethod adds single excitation to Hartree-Fock deterinant, while CISD adds singles and doubles. The CISD ethod (Pople et al., 977; Krishnan et al., 980; Raghavachari and Pople, 98) is an iterative technique where the coputational dependence of each iteration scales as the sixth power of the size of the syste. A disadvantage of the liited CI ethods is that they are not size consistent, i.e. the energy does not scale linearly with the size of the syste, and CISD energy is not additive for infinitely separated systes. For exaple, the CISD energy for two infinitely separated He atos is different fro twice the energy of a single He ato. Quadratic configuration interaction (QCI) technique (Pople et al., 987) introduces size consistency in CISD theory. The CISD ethod consists of a set of linear equations in the configuration expansion coefficients (for single and double excitations) which are solved iteratively. In the QCISD ethod (Gauss and Creer, 988; Trucks and Frisch, 998; Salter et al., 989), these equations are odified by the introduction of additional ters, quadratic in the expansion coefficients, which ake the ethod size consistent Møller-Plesset Perturbation theory (MP) MP theory (Møller and Plesset, 934) treats the electron correlation as a perturbation on the Hartree-Fock proble. It adds higher excitations to the Hartree-Fock 58

71 theory as a non-iterative correction. The wave function and energy are expanded in a power series of the perturbation. The Hailtonian is expressed in two parts: H = H 0 + H, (5.8) where H 0 is called the zeroth order Hailtonian, and H is the perturbation. H 0 is chosen such that the Schrödinger equation H 0 0 = E 0 0 can be solved exactly for the zeroth order eigenfunctions 0 and the corresponding zeroth order energy eigenvalues E 0. It is assued that perturbation is sufficiently sall that its presence does not appreciably alter the eigenfunctions of the syste. Hatree-Fock energy is correct to the first order. Hence the perturbation energies start contributing fro second order. While perforing MP calculations, first HF calculation is perfored and then second, third, fourth, and fifth order perturbation corrections are coputed. The final ground state energy is given by E ground state = E HF + E () + E (3) + E (4) + E (5) +, (5.9) where E HF is the Hartree-Fock energy and E (i) are the successive Møller-Plesset perturbation corrections. In perturbation theory, the different correlation contributions eerge through their interaction with the starting HF wave function 0. Since the Hailtonian contains only one- and two-electron ters, only single and double excitations can contribute via direct coupling to 0 in the lowest orders. However, the self-consistent optiization of the HF wave function prevents direct ixing between single excitations and 0. Thus, the second and third-order energies have contributions only fro double excitations. In higher orders, there is indirect coupling via double excitations, and thus the fourth and fifthorder energies have contributions fro single, double, triple, and quadruple excitations. 59

72 A coon notation denoted as MPn (Raghavachari and Pople, 978; Raghavachari et al., 980; Frisch et al., 980) is used. Thus, MP (Head-Gordon et al, 988; Frisch et al, 990; Head-Gordon, 994; Trucks et al, 998; Saebo and Allof, 989), MP3 (Pople et al, 977; 976), MP4 (Raghavachari and Pople, 978) denote the total energies correct to second, third, fourth order respectively. Perturbation theory truncated at any order is size consistent. MP, MP3, MP4, and MP5 ethods scale as the fifth, sixth, seventh and eighth power of the size of the syste. If triples are excluded fro the MP4 ethod (Raghavachari and Pople, 978; Raghavachari et al., 980; Frisch et al., 980; Bartlett and Shavitt, 977; Bartlett and Purvis, 978; Bartlett et al., 983; Wilson and Saunders, 979) the resulting MP4 (SDQ) technique (Raghavachari and Pople, 978) can be evaluated with sixth order coputational dependence. Fifth-order theory (MP5) (Kucharski and Bartlett, 986; Kucharski et al., 989; Bartlett et al., 990; Raghavachari et al., 990) has been ipleented though feasible only for sall systes. Analytical derivatives of the potentials energy can be coputed for MP (Frisch et al, 990; Pople et al, 979; Handy and Schaefer, 984), MP3, and MP4 (SDQ) ethods (Trucks et al, 988). Analytical frequencies can be coputed for MP ethod (Head- Gordon, 994; Trucks et al, 998). MP is the ost applicable ethod for electron correlation effects for large systes. It can be ipleented without requiring the storage of two-electron integrals and any other interediate quantities which akes it possible to study large systes such as C Density functional theory (DFT) Shortly after the forulation of quantu echanics in the id 90's, Thoas (96) and Feri (98) introduced the idea of expressing the total energy of a syste as 60

73 a functional of the total electron density. Because of the crude treatent of the kinetic energy ter, i. e. the absence of olecular orbitals, the accuracy of these early attepts was far fro satisfactory. It was not until the 960's that an exact theoretical fraework called density functional theory (DFT) was forulated by Hohenberg and Kohn (964) and Kohn and Sha (965) that provided the foundation for accurate calculations. Earlier, otivated by the search for practical electronic structure calculations. In contrast to the Hartree-Fock picture, which deals with a description of individual electrons interacting with the nuclei and all other electrons in the syste, in density functional theory, the total energy is decoposed into three contributions, a kinetic energy, a Coulob energy due to classical electrostatic interactions aong all charged particles in the syste, and a ter called the exchange-correlation energy that captures all any-body interactions. In density functional theory (Hohenberg and Kohn, 964; Kohn and Sha, 965), correlation energy along with exchange energy is treated as a functional of the three diensional electron density. DFT partitions the electronic energy as: E = E T +E V + E J + E XC, (5.0) where E T is the kinetic energy ter, E V includes ters describing the potential energy of the nuclear-electron attraction and of the repulsion between pairs of nuclei, E J is the electron-electron repulsion ter and E XC is the exchange correlation ter. All ters, except the nuclear-nuclear repulsion in Eqn.(5.0) are functions of the electron density. The ter E XC accounts for the exchange energy arising fro the antisyetry of the quantu echanical wavefuncion, and dynaic correlation in the otions of the individual electrons. Hohenbrg and Kohn (964) deonstrated that E XC is 6

74 deterined entirely by the electron density, i.e. it is a function of electron density. The ter E XC is usually approxiated as an integral involving only the spin densities and possibly their gradients as given by E v v v v 3 v ) = 0 f ( )( r ), ( r), ( r ), ( r ) d r, (5.) XC ( B. and refer to and spin density, and total electron density is ( + ). E XC is separated into exchange and correlation parts corresponding to sae spin and ixed spin interactions as given by E XC ( ) = E X ( ) + E C ( ), (5.) the ters E X ( ) and E C ( ) are again functionals of electron density, tered as exchange functional and correlation functional. Local functionals depend on the electron density, while gradient corrected functionals depend on both electron density and its gradient. In local density approxiation (LDA) approxiation, the exchange-correlation energy is taken fro the known results of the any-electron interactions in an electron syste of constant density (hoogeneous electron gas). The LDA assues that at each point in a olecule or solid there exists a well defined electron density; an electron at such a point experiences the sae any-body response by the surrounding electrons as if the density of these surrounding electrons had the sae value throughout the entire space as at the point of the reference electron. The exchange-correlation energy of the total olecule or solid is then the integral over the contributions fro each volue eleent. The contributions are different fro each volue eleent depending on the local electron density. The local exchange functional is defined as 6

75 E X LDA v = 0 d r. 4 (5.3) In DFT, the total electron density is decoposed into one-electron densities, which are constructed fro one-electron wave functions. These one-electron wave functions are siilar to those of Hartree-Fock theory. For olecular systes, DFT leads to a olecular orbital (MO) picture in analogy to the Hartree-Fock approach. DFT has been successfully extended to open-shell systes and agnetic solids (von Barth and Hedin, 97; Gunnarsson et al., 97). In these cases, the local exchangecorrelation energy depends not only on the local electron density, but also on the local spin density (which is the difference between the electron density of spin-up electrons and that of spin-down electrons). The resulting generalization of LDA is called local spin density approxiation (LSDA). Within the local density approxiation, binding energies are typically overestiated. Any real syste is spatially inhoogeneous, i.e. it has spatially varying density. The accuracy could be iproved if the rate of this variation is included in the functional. Gradient expansion approxiation (GEA) tries to systeatically calculate the gradient corrections to the LDA. In generalized gradient approxiations (GGA), instead of power series gradient expansions, generalized functions are used. Inclusion of non-local gradient corrections iproves the values of binding energies. Becke (988) forulated the gradient corrected exchange functional based on the LDA exchange functional given as E X Becke 4 3 x 0 X = ELDA 4 d 3v r, ( + 64 sinh ( x)) (5.4) 63

76 where x = 4 / 3, 4 is a paraeter chosen to fit known exchange energies of the inert atos. Becke functional is defined as a correction to the local LDA exchange functional. Self consistent Kohn- Sha DFT calculations are perfored in an iterative anner analogous to the SCF coputation. Becke forulated which include Hartree- Fock and DFT exchange along with DFT correlation as; E = c E + c E. (5.5) XC hybrid HF X HF DFT XC DFT For exaple Becke style three paraeter functional denoted as B3LYP (Becke, 993) is defined as: E XC B3LYP X X X X C C C = E + c E E ) + c E + E + c ( E E ). (5.6) LDA 0 ( HF LDA X Becke VWN 3 C LYP VWN 3 The paraeter c 0 allows any cobination of Hartree- Fock and LDA local exchange to be used. The local correlation functional VWN3 is corrected by the LYP correlation correction. Table 5. lists various odel cheistries definable via ab initio ethods and standard basis sets. Each cell in the chart defines a odel cheistry. Table 5. Effect of various odel cheistries and basis sets on accuracy of the solution to Schrödinger s equation (Foresan and Frisch). 64

77 The coluns correspond to different theoretical ethods and the rows to different basis sets. The level of correlation increases fro left to right across any row, with Hartree-Fock ethod at the extree left (including no correlation), and the full configuration interaction ethod at the right (which fully accounts for electron correlation). The rows of the chart correspond to increasingly larger basis sets. The botto row of the chart represents a copletely flexible basis set. The cell in the lower right corner of the chart represents the exact solution of the Schrödinger equation. 8 65

78 9 CHAPTER 6 6 Neural Networks As the ter neural network (NN) iplies, this approach is aied towards odeling real networks of neurons in the brain. It was inspired by a nuber of features of the brain that would be desirable in artificial systes, such as its robustness and fault tolerance, its flexibility and the ability to deal with fuzzy and noisy inforation and its highly parallel structure. 6. Biological neurons The brain is coposed of about 0 highly connected eleents called neurons, each of which is connected to about 0 4 other neurons. Fig 6. shows the scheatic of biological neuron. Fig 6. Scheatic of biological neuron 66

79 A neuron's dendritic tree is connected to thousands of neighboring neurons. Each neuron receives electrocheical inputs fro other neurons at the dendrites. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and teporal suation. Spatial suation occurs when several weak signals are converted into a single large one while teporal suation converts a rapid series of weak pulses fro one source into one large signal. The aggregate input is then passed to the soa (cell body). The soa and the enclosed nucleus do not play a significant role in the processing of incoing and outgoing data. Their priary function is to perfor the continuous aintenance required to keep the neuron functional. If the su of these electrical inputs is sufficiently powerful to activate the neuron, it transits an electrocheical signal along the axon, and passes this signal to the other neurons whose dendrites are attached at any of the axon terinals. These attached neurons ay then fire. The part of the soa that does concern itself with the signal is the axon hillock. If the aggregate input is greater than the axon hillock's threshold value, then the neuron fires, and an output signal is transitted down the axon. The strength of the output is constant, regardless of whether the input was just above the threshold, or a hundred ties as great. The output strength is unaffected by the any divisions in the axon; it reaches each terinal button with the sae intensity it had at the axon hillock. This unifority is critical in an analogue device such as a brain where sall errors can be critical, and where error correction is ore difficult than in a digital syste. It is iportant to note that a neuron fires only if the total signal received at the cell body exceeds a certain level. The neuron either fires or it does not, there are not different grades of firing. 67

80 Each terinal button is connected to other neurons across a sall gap called a synapse, as shown in Fig 6.. The physical and neuro-cheical characteristics of each synapse deterine the strength and polarity of the new input signal. This is where the brain is ost flexible. Fig 6. Synaptic gap Changing the constitution of various neuro-transitter cheicals can increase or decrease the aount of stiulation that the firing axon iparts on the neighboring dendrite. Altering the neurotransitters can also change whether the stiulation is excitatory or inhibitory. Many drugs, such as alcohol have draatic effects on the production or destruction of these critical cheicals. The infaous nerve gas, sarin, can kill because it neutralizes a cheical (acetylcholinesterase) that is norally responsible for the destruction of a neurotransitter (acetylcholine). This eans that once a neuron fires, it keeps on triggering all the neurons in the vicinity and one no longer has control over uscles. So, fro a very large nuber of extreely siple processing units (each perforing a weighted su of its inputs, and then firing a binary signal, if the total input 68

81 exceeds a certain level) the brain anages to perfor extreely coplex tasks. This is the odel on which artificial neural networks (NN) are based. Thus far, artificial neural networks have not even coe close to odeling the coplexity of the brain. It is worth noting that even though biological neurons are slow copared to electrical circuits, the brain is able to perfor any tasks uch faster than any conventional coputer because of the assively parallel structure of biological neural networks. While in actual neurons the dendrite receives electrical signals fro the axons of other neurons, in an artificial neuron (perceptron) these electrical signals are represented as nuerical values. At the synapses between the dendrite and axons, electrical signals are odulated in various aounts. This is also odeled in an artificial neuron by ultiplying each input value by a value called the weight. An actual neuron fires an output signal only when the total strength of the input signals exceeds a certain threshold. We odel this phenoenon in a ANN by calculating the weighted su of the inputs to represent the total strength of the input signals, and applying a step function on the su to deterine its output. Fig 6.3 shows a siple odel of a neuron. Fig 6.3 Model of a neuron 69

82 6. History of ultilayer neural networks The first step toward artificial neural networks cae in 943 when Warren McCulloch and Walter Pitts (943) wrote a paper A logical calculus of the ideas ianent in nervous activity describing how neurons ight work that united the studies of neurophysiology and atheatical logic. They odeled a siple neural network with electrical circuits and showed the network in principle, could copute any arithetic function. Training of the neural network was introduced when Hebb (949) in 949 published the book "The organization of behavior," which pointed out the fact that neural pathways are strengthened each tie they are used, a concept fundaentally essential to the ways in which huans learn. If two nerves fire at the sae tie, the connection between the is enhanced. Frank Rosenblatt (958) invented the perceptron and associated learning rule. It was built in hardware. A single-layer perceptron was found to be useful in classifying a continuous-valued set of inputs into one of two classes. This was the first practical application of artificial neural networks. In 959, Bernard Widrow and Marcian Hoff developed odels called "ADALINE" (ADAptive LInear Neuron) and "MADALINE (Multiple ADAptive LInear Neuron). ADALINE was developed to recognize binary patterns so that if it was reading streaing bits fro a phone line, it could predict the next bit. MADALINE was the first neural network applied to a real world proble, using an adaptive filter to eliinate echo on phone lines. In 96, Widrow and Hoff developed a learning rule to train ADALINE networks. Minsky and Papert (969) showed that perceptron and ADALINE networks were liited for classification probles in which the two classes were linearly separable. These networks were incapable of handling nonlinear classification probles, such as exclusive-or (XOR) 70

83 proble. Rosenblatt and Widrow were aware of these liitations and proposed ultilayer networks. First description of an algorith to train ultilayer networks was given by Webros (974). He described it for general networks with neural networks as a special case. It was rediscovered by Ruelhart, Hinton, and Willias (986), Parker (985) and Le Cun (985). Backpropagation was developed by generalizing Widow-Hoff learning rule to ultilayer neural networks with nonlinear differentiable transfer functions. 6.3 Neural networks versus conventional coputers Neural networks take a different approach to proble solving than that of conventional coputers. Conventional coputers use an algorithic approach i.e. the coputer follows a set of instructions in order to solve a proble. Unless the specific steps that the coputer needs to follow are known the coputer cannot solve the proble. That restricts the proble solving capability of conventional coputers to probles that we already understand and know how to solve particularly for repetitive tasks. But coputers would be so uch ore useful if they could do things that we do not exactly know how to do. Neural networks process inforation in a siilar way the huan brain does. The network is coposed of a large nuber of highly interconnected processing eleents (neurons) working in parallel to solve a specific proble. Neural networks learn by exaple. They are not prograed to perfor a specific task. The exaples ust be selected carefully otherwise the network ight function incorrectly. The disadvantage is that because the network finds out how to solve the proble by itself, its operation can be unpredictable. 7

84 On the other hand, conventional coputers use a predictive approach to proble solving. The way the proble is to be solved ust be known and stated in unabiguous instructions. These instructions are then converted to a high level language progra and then into achine code that the coputer can understand. Their operation is predictable. Neural networks and conventional algorithic coputers are not in copetition but copleent each other. There are tasks that are ore suited to an algorithic approach, such as arithetic operations and tasks that are ore suited to neural networks. Even ore, a large nuber of tasks, require systes that use a cobination of the two approaches (norally a conventional coputer is used to supervise the neural network) in order to perfor at axiu efficiency. Neural networks have shown to be good at probles which are easy for a huan but difficult for a traditional coputer, such as iage and voice recognition and predictions based on past knowledge. 6.4 Model of a neuron Fig 6.4 shows a single input neuron, which fors the building block for the Artificial Neural Network (ANN). The scalar input p is ultiplied by the scalar weight w, the other input is ultiplied by the bias b. the suer adds wp and b to for the net input n, which goes into a transfer function f to produce a scalar output a. w and b are adjustable scalar paraeters of the neurons, which are adjusted by the learning rule so as to iniize the errors (i.e. neuron outputs and the target values). 7

85 Fig 6.4 Single input neuron (Hagan et al. 996) 6.5 Multilayer network Fig 6.5 shows a neural network with ultiple layers with each layer having ultiple neurons. Each layer has its own weight atrix W and bias vector b. The superscript to each paraeter indicates the layer it is associated with. The network has R inputs. The outputs fro one layer are inputs to the next layer. Fig 6.5 Multilayer network (Hagan et al. 996) A layer whose output is the output of the entire neural network is called as the output vector, whereas other layers are called as hidden layers. The nuber of neurons in the 73

86 input layer is the sae as the nuber of input variables and the nuber of neurons in the output layer is sae as the nuber of outputs. The nuber of neurons to be used for the hidden layer is decided by the designer depending upon the coplexity of the function involved. 6.6 Transfer functions A transfer function is a linear or non-linear function of the net input to the neuron. Nonlinear transfer functions give neural networks their nonlinear capabilities. The function ust be differentiable for the optiization of the paraeters and is desirable to saturate at both extrees. The ost coon fors of transfer functions are the onotonically increasing sigoidal or hyperbolic tangent function. Hyperbolic tangent function Log sigoid function n n e e f ( x) =. n n e + e f ( x) =. n + e (6.) (6.) ( a ) ( b ) Fig 6.6 a) Hyperbolic tangent function ranging fro [-, +] for x = ± 5 b) Log sigoid function ranging fro [-,+] for x = ± 5 74

87 These functions are also called as squashing functions because of their asyptotic behavior at ± 5. Convergence with functions which are syetric about the origin, like the hyperbolic tangent is faster (LeCun, 998) 6.7 Neural network training Neural network training is a procedure for odifying weight atrices and bias vectors of the network to eet certain goals, such as iniizing the difference between the network output and target values. It is equivalent to perforing the non-linear optiization of the network paraeters. There are two approaches for neural network training: supervised learning and unsupervised learning. In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and copares its resulting outputs against the desired outputs. Errors are then propagated back through the syste, causing the syste to adjust the weights which control the network. This process occurs over and over and the weights are continually adjusted. During the training of a network the sae set of data is processed any ties as the weights and biases are adjusted. It is very iportant to have enough data points for the training so as to cover the configuration space adequately. The nuber of data points depends on the nuber of input variables, as the nuber of input variables increase. It increases the diensionality of the proble and ore points are required in order to cover the entire configuration space. The nuber of data points required also depends on the coplexity of the proble. Also, it is equally iportant to choose the input variables carefully. Soeties it ay not be obvious as to which variables to choose so that the input data contains coplete inforation fro which the target output is derived. Supervised learning has applications in function approxiation 75

88 and pattern classification probles where the nuber and type of patterns is known beforehand. In unsupervised training, the network is provided with the inputs but not with the desired outputs. The syste itself decides what features to use to group the input data. This is referred to as self-organization. These networks look for regularities or trends in the input to adapt the network paraeters. Unsupervised learning has applications in clustering operation. They learn to categorize the input patterns into a finite nuber of classes Supervised training and paraeter optiization The supervised training is done by coparing the output with known target values. The optiization of network paraeters is perfored by soe iterative optiization schee until the desirable solution by the scalar cost function is reached. Norally the cost function is taken as the su of the squared difference between the network output and the target values. E [( t ) ] ( e ) E a =, (6.3 ) where t is the target vector, a is the vector of neural network output and e is the difference between the target and the neural network output. E(e ) is the expectation of the su squared error. The training process can be classified into two types: increental training and batch training. In increental training the weights and biases of the network are updated each tie an input is presented to the network, while in batch training, the weights and biases are updated after all inputs are presented. The increental training is coparatively faster since in batch training the redundancy or the presence of patterns which are very siilar leads to slower convergence. In increental training the noise 76

89 present in the updates coing fro the nuerical calculation of the gradient after each successive saple can result in weights which are able to find a different iniu before getting stuck in a local iniu, unlike in batch training where the rando initialization of the weights is very crucial. However such noise can prevent full convergence to the iniu. For batch training the conditions of convergence are well understood. Many optiization schees, such as conjugate gradient ethod are applicable only for batch training. Also, the analysis of convergence rates is sipler. In order to iniize the cost function, the training algorith cycles through the following steps:. The network is presented with one (for increental training) or all (for batch training) training saples consisting of both inputs and targets.. The network output is easured and the squared error between the target and the output is calculated. 3. The weights and biases are adjusted so as to iniize the cost function i.e. the squared error. 4. The procedure is repeated until the squared error reaches the desired liit. The optiization of the paraeters is usually done using gradient based ethods, such as steepest descent or conjugate gradient or quasi Newton algoriths. 6.8 LMS (Least Mean Squared Error) algorith The LMS or Widrow Hoff algorith is based on approxiate steepest descent algorith. Widrow and Hoff noticed that the estiate of the ean squared error could be obtained by using squared error at each iteration. 77

90 F( x) = e ( k), (6.4) where k is the iteration nuber. The partial derivatives of squared error with respect to weights and biases at k th iteration are given by: e ( k) e( k) = e( k), (6.5) w w, j, j and e ( k) e( k) = e( k). b b where R is the nuber of input variables. e( k) = w, j [ t( k) a( k)] = w, j w, j #! t( k) " R i= w, i pi ( k) + b = p j ( k), (6.6) where p i ( k) is the i th eleent of the input vector at the k th iteration. Siilarly, e( k) =. b (6.7) Therefore # F ( x) = e ( k) = e( k)! where p is the input vector and is constant " p input for bias. That is, the approxiate gradient is given by ultiplying the input with the error. Now the steepest descent ethod produces proceeds as x k + = xk F( x) x = xk. (6.8) LMS algorith can be written in atrix notation as: W (k+) = W (k) + e(k) p T (k), and b (k+) = b(k) + e(k). (6.9) 78

91 6.9 Backpropagation algorith Neural networks eploying only one layer of neurons can solve only linearly separable classification probles. First description of an algorith to train ultilayer networks was given by Webros (974). He described it for general networks with neural networks as a special case. It was rediscovered by Ruelhart (986), Parker (985) and Le Cun (985). Backpropagation was developed by generalizing Widow-Hoff learning rule to ultilayer neural networks with nonlinear differentiable transfer functions. It is also a gradient descent algorith where the paraeters are adjusted along the negative of the gradient of the cost function. The ter backpropagation refers to the anner in which the gradient is coputed for nonlinear ultilayer networks. With single layer network, the weights are adjusted based on the error values as in LMS (Widrow-Hoff) learning. However, with ultilayer networks, it is less obvious how to calculate the error for the hidden layers as the error fro the output layer is not an explicit function of weights in the hidden layers. Chain rule is applied to copute the gradient for the steepest descent algorith. F F ni =, (6.0) w n w i, j i i, j where and F b i F = n i n b ni is the net input to the i th neuron in the th layer, and biases the layer. Now the net input ni is coputed as: i i. w, and b i j i are the weights 79

92 80., and,,, = = + = = i i j j i i S j i j j i i b n a w n b a w n (6.) Now, a ter sensitivity is defined as the change in the cost function F with respect to change in the i th eleent of the net input at the layer. i i n F s =. (6.) Therefore. and,, i i j i j i s b F a s w F = = (6.3) Now, the steepest algorith can be expressed as:. ) ( ) ( and, ) ( ) ( ) ( T s k b k b a s k W k W = + = + (6.4) where W and b are weight atrix and bias vector at the layer.!!!!!!!!!!! " # = = s n F n F n F n F s... (6.5) Next step is to copute the sensitivities. This process involves a recurrence relationship in which the sensitivity at layer is coputed fro the sensitivity at layer +, which gives it the nae backpropagation. Now the Jacobian atrix is defined as:

93 8!!!!!!!!!!!!! " # = S S S S S S n n n n n n n n n n n n n n n n n n n n (6.6) Now j i n n + is coputed as:. ) (,,,, j j j i j j j i j i l S l l i j i n n f w n a w n b a w n n = = + = = + + (6.7) Hence, the Jacobian atrix can be written as:!!!!!!!! " # = = + + ) ( ) ( ) ( ) ( ), ( S n f n f n f n F n F W n n & & & & & (6.8) And the recurrence relation can be written as: ) )( ( + + = T s W n F s &, (6.9) Thus, the sensitivities are propagated backward through the network fro last to first layer, which gives it the nae backpropagation.

94 For the output layer the sensitivities are coputed as: s = F& ( n )( t a ), (6.0) where t is the target value and a is the neural network output. 6.0 Nuerical optiization techniques The ean squared error of a single layer network with linear transfer function is a quadratic function with only one iniu and a constant curvature. In such cases, the LMS algorith guarantees the convergence, so long as the learning rate is not too large. But for a ultilayer network the ean squared error ay be a non-quadratic function where the curvature varies drastically over the paraeter space which akes it difficult to choose the learning rate. To overcoe soe of these difficulties heuristic techniques, such as variable learning rate, using oentu and rescaling the variables are eployed. Another approach eploys standard nuerical optiization techniques, such as conjugate gradient and quasi Newton ethod Conjugate gradient ethods The backpropagation algorith adjusts the weights along the steepest descent direction. Although this is the direction in which the function decreases ost rapidly, this ay not produce faster convergence. Instead of using orthogonal search directions as in the steepest descent ethod, conjugate gradient ethod adjusts the search direction so as to pass through the iniu of the function. The conjugate gradient ethod uses the inforation on the first derivative of the error function only and also has a quadratic convergence property i.e. it converges to the iniu of a quadratic function in a finite nuber of steps. This is a O(N) ethod. 8

95 6.0. Newton and quasi-newton ethods Newton ethod The cost function i.e. the ean squared error can be expanded using Taylor s series as: T T F( w + w) = F( w) + g w + w H w +..., where g is the gradient vector and H is the Hessian atrix defined as: (6.) F( w) (6.) g =, w F( w) H = w Quadratic approxiation of the error surface can be obtained by ignoring higher order ters in the Taylor s series expansions. The optiu value of the weights is: w = H w i.e. the adjustents to g. (6.3) Newton ethod requires coputing the Hessian atrix which involving second order derivatives and then finding its inverse which takes O(N 3 ) iterations. But this does not guarantee the convergence. If the error surface is not quadratic, then the Hessian atrix ay not be always positive definite and the algorith diverges. In quasi Newton ethod, Hessian atrix is estiated by soe positive definite atrix, which ensures the convergence. The Hessian atrix is further approxiated as: H where J is the Jacobian atrix. T 6 J J, (6.4) 83

96 6.0.3 Levenberg-Marquardt algorith The Hessian atrix in Gauss-Newton ethod ay not always invertible. Levenberg-Marquardt algorith uses an approxiation to the Hessian atrix as: T T [ J J + I ] J w= µ, (6.5) where J is the Jacobian atrix consisting of the first derivatives of network errors with respect to weights and biases. When µ is zero, then it is siilar to Gauss-Newton ethod, and when µ is large, it becoes the gradient descent algorith. The Jacobian atrix can be coputed through standard backpropagation technique that is less coplex than coputing the Hessian atrix (Hagan and Menhaj, 994). 6. Bias/variance dilea One of the ost serious probles that arise in connectionist learning by neural networks is overfitting of the provided training exaples. This eans that the trained function fits very closely the training data. However it does not generalize well, i.e. it can not odel unseen data sufficiently well. The proble is to choose a function that both fits the training data as well as generalizes well over a specified range. If the function is paraeterized so as to fit the training data perfectly using a higher order polynoial (which is equivalent to having ore nuber of neurons in the hidden layers) then the fits would be drastically different for different sets of data randoly drawn fro the sae underlying function. More paraeter flexibility has a benefit of fitting anything but at the cost of sensitivity to the variability in the data set. There is a variation between the average fit over all data sets and the fit for a single data set i.e. a variance introduced by the fits over ultiple training sets. This can be addressed to soe extent by having ore nuber of data points for the fitting. There is a drawback to the flexibility afforded by 84

97 extra degrees of freedo in fitting the data that the aount of data required grows exponentially with the order of the fit. On the other hand, if very few paraeters are available then the fit would be too restrictive and although the fit would not be as sensitive to the variability in the training data there would be a fixed error or bias no atter how uch data is collected. This proble is referred to as bias/variance dilea (Gean et. al, 99). This becoes a proble when there are relatively few data points especially in high diensional space. Statistical bias is the coplexity restriction that the neural network architecture iposes on the degree of fitting accurately to the target function. The statistical bias accounts only for the degree of fitting the given training data, but not for the level of generalization. Statistical variance is the deviation of the neural network training efficacy fro one data saple to another that could be described by the sae target function odel. This statistical variance accounts for the generalization of whether or not the neural network fits the exaples without regard to the specificities of the provided data. One approach to avoid overfitting is to deliberately add soe rando noise with a ean value of zero to the odel, which akes it difficult for the network to fit any specific data point too closely. Another way of introducing noise is to use increent training, i.e. updating the weights after every data point is presented, and to randoly reorder the data points at the end of each training cycle. In this anner, each weight update is based on a noisy estiate of the true gradient. 6.. Neural network growing and neural network pruning To reduce the statistical bias neural network growing is eployed, where initially the training is started with less nuber of neurons. During the training process when the 85

98 decrease in error falls below a certain threshold value, then ore neurons are added. With a view to reduce the statistical variance, neural network pruning is eployed where a large network is first trained and the relative iportance of the weights is assessed. Less iportant weights are then reoved without affecting the perforance uch to get a saller network 6. Regularization and early stopping Regularization involves odifying the perforance function which is the ean squared error. The risk of overfitting can be reduced if the variance is used in the error function to penalize the neural network odel with high curvature. The usual perforance function is defined as: N F = Mean Squared Error ( MSE) = ( t i a i ), N i= It is odified as: (6.6) N (6.7) F = ( ti ai ) + 4 R( W ), N i= where R(W) is the function of network paraeters. Regularization is perfored by weight decay (Hertz et. al, 996). A coon for of R(W) used to penalize the odel is taken as the su square of weights, since overfitting with large curvature occurs when the weights of the network becoe too large. F = se Re gularization = 4 se + ( 4 ) sw, (6.8) n sw= w j, n j= where 4 is the perforance ratio. Using this perforance function will cause the network to have saller weights and biases and this will cause the network response to be soother and less likely to overfit. 86

99 Bayesian fraework (MacKay, 99) provides an approach to deterine the optial regularization paraeter in an autoated fashion. In this fraework the weights and biases are assued to be rando variables with a specified distribution. The regularization paraeters are related to unknown variances associated with these distributions. Bayesian regularization is ipleented along with Levenberg- Marquardt algorith (Foresee and Hagan, 997). 6.3 Early stopping In order to achieve good generalization, overfitting needs to be controlled. This can be achieved by early stopping. In this technique, the available data set is divided into three subsets, viz. training set, validation set, and testing set. Training set is used to copute the gradients and updating the weights and biases. Validation and testing sets are used to onitor the error. Generally, the error on the training and validation sets decrease during the initial phase of training. However, when the network starts overfitting, the error on the training set continues to decrease while that on the validation set starts increasing. When the error on the validation set increases for a specified nuber of iterations, the training is stopped. Fig 6.7 shows a typical variation of the error on the training and validation sets during neural network training. 87

100 Fig 6.7 Evaluation of error during training 6.4 Noralizing the data If the range of the output variables is quite large, then before training it is often useful to scale the inputs and targets so that they fall within a specified range. Since the range for nonlinear transfer functions, such as logsigoid and hyperbolic tangent function is [0, ] and [-, +], the inputs and targets are scaled within a range fro [-, +] with the average close to zero. ( p in p) (6.9) i pi =, (ax p in p) where in p and ax p are the iniu and axiu of the input or target values. 6.5 Neural network derivatives In general, for function approxiation probles, the neural network is used to ap the functional relationship between input and output variables. Once trained, this neural network is presented with a input vector to calculate the corresponding output. But in soe cases the derivatives of the output variables with respect to the input variables 88

101 89 ay be desired. An exaple of such an application would be olecular dynaics siulation, where a ultilayer neural network is trained for the energy of a olecular cluster with bond distances and angles that define the geoetry of the cluster as the input. During the olecular dynaics siulation. The force i.e. the first derivative of the energy with respect to coordinates of atos is required First derivative with respect to weights For the ultilayer network, the network output is given as: 0,,..., ) ( = + = M for b a W f a, (6.30) where the superscript represents the layer nuber. M is the nuber of layers of the network and input to the first layer is p a = 0 Now,. and,,, i i i i j i i i j i b n n F b F w n n F w F = = (6.3). and,, i i j i j i s b F a s w F = = (6.3) where the sensitivity is defined as i i n F s =

102 6.5. Derivatives of the transfer function The derivatives of the transfer functions are required to calculate the derivatives of the network output with respect to the network input. Linear function: f ( x) = x, (6.33) df ( x) =, dx Log sigoid function d f ( x) = 0. dx The log sigoid transfer function is defined as: f ( x) = + e x (6.34) df ( x) = e dx x ( + e x ) = f ( x) ( f ( x) ), d f ( x) = f ( x) dx Hyperbolic tangent (tan sigoid) function ( f ( x) )( f ( x) ). The hyperbolic tangent function is defined as: f ( x) e e df ( x) = ( f dx x x e (6.35) x + e = x ( x)), d f ( x) = f ( x)( f dx ( x)). 90

103 First derivative with respect to inputs For the ultilayer network, the network output is given as: 0,,..., ) ( = + = M for b a W f a, (6.36) where the superscript represents the layer nuber. M is the nuber of layers of the network and the input to the first layer is p a = 0 The sensitivity is defined as: ) ( + + = = M M M M M M s W n F dn da s T &, (6.37) where ) ( M n F & is the Jacobian atrix!!!!!!!! " # = ) ( ) ( ) ( ) ( S n f n f n f n F & & & & (6.38) For the output layer M, the sensitivity s M is The sensitivity of hidden layers is coputed as: ( ),,..., = = + + M for s W dn df s T, (6.39) where dn df is the derivative of the transfer function at layer, and M is the nuber of layers. The derivative with respect to the network input is coputed as: s W n a T =

104 6.6 Novelty detection Multilayer feed forward neural network can fit, in principle any real valued continuous function of any diension to an arbitrary accuracy provided it has sufficient nuber of neurons (Cybenko, 989). However, the perforance of the trained neural network depends upon the data set provided during the training. The training algorith iniizes errors between the targets and the network outputs for the training data. But this does not guarantee the neural network perforance for a data that was not present in the training set, usually called as testing data. Consider an exaple of approxiating a function: f(x) = x 4. The inputs p used for the training the neural network are selected within the range [-, ]. Now this trained network is tested for new inputs ranging fro [-, ]. Fig 6.8 shows a plot of neural network output trained over the range [-, ] and tested over the range [-, ]. Fig 6.8 Plot of neural network output for f(x) = x 4 9

105 In the region of input between [-, ], which is sae as that of training data, the network perfored well. However the network perfors poorly outside this region, producing very large errors between targets and outputs. In the case of neural network fitting for a ultivariate function, it becoes difficult to decide which input vector lies outside the range of the inputs in the training set and whether to count on the neural network output for a given input vector. Also, when the diension of the input vector is large then the distinction between interpolation and extrapolation becoes ore abiguous. Therefore, differentiating the boundaries between noral and abnoral data is uch ore difficult, aking it difficult to decide whether a specific input should be accepted or rejected. Novelty Detection is a technique that is used to identify abnoral data points. Thus, the novelty detection algorith should be capable of identifying the inputs that fall outside the range of the training data. For exaple in the above exaple of the function of one variable, if an input is in between [-, ], then the corresponding output is accepted, whereas is an input is out of range [-, ] then the corresponding output is rejected. Soe of the novelty detection techniques (Pukrittayakaee and Hagan, 00) are:. Miniu distance coputation. Neural tree algorith (Martinez, 998) 3. Gaussian kernel estiator (Bishop, 994) 4. Associative ultilayer perceptron (Frosini, 996) 93

106 6.6. Miniu distance coputation This ethod is based on the -nor distance calculation between training vectors and the testing vector. When two vectors are close then their individual eleents would also be close to each other. The -nor distance between the vectors is given as, where a i and b j are the two vectors. i j T [( a b ) ( a b )] a b =, i j i j (6.40) A vector d is constructed whose eleents are distances of a testing vector to all input vectors used for training. The iniu distance separating the testing vector fro all training vectors is, d = in(d), and the ean separation distance <d> is the average value of the eleents of d. A large value of d would indicate that the testing vector is different fro training vectors for which the neural network was trained. Next step would be to incorporate this testing vector into a new training data set along with original training data set, and train a new neural network for this new data set. On the other hand, a saller value of d suggests that there is at least one point in the training data set that is close to the testing vector, and the testing vector ay or ay not be included into the new training data set. This is decided based on the average distance <d> as well. If <d> is also sall then that suggests that there are any data points in the training set close to the testing data point and this testing data point need not be included in the new training data set. In other words, the density of points close to the testing point is high. But, if d is sall and <d> is large, then it indicates that although there is at least one training data point close to the testing point, the density of points close to the testing point is sall. In other words, this 94

107 point lies in an isolated region fro other data points in the training set and this testing point should be incorporated into the new training set to iprove the overall neural network fit. A selection procedure based only on the agnitude of d ay not be able to detect points that are critical to iprove the fit. In regions where the gradient is large any points would be required to obtain an accurate fit. The network outputs and targets could also be used for novelty detection. The iniu distance algorith is odified to include the outputs as well. For an input vector p the error between the output a and target t is written as: e T = t T a T = F(p T ) a T, (6.4) where F is the function to be approxiated. F could be expanded about the training input vector closet to p using Taylor s series expansion as: e T = F( p = ( t 0 0 a F( p) ) + p T 0 T F( p) ) + p p= p T 0 T ( p p ) p= p T ( p p ) (6.4) F( p) where p 0 is the training vector and p T p= p0 is the first derivative of the function with respect to the input evaluated at the training vector. Now, if p is the closest vector in the training set to p, with a iniu distance d and if higher order ters could be ignored, then T T T F( p) (6.43) T e 7 ( t a ) + p= p ( p p ). p So, the network error not only depends on the distance between the testing vector p and the closest training vector p, but also on the gradient evaluated at the training vector. 95

108 96 Therefore, if the underlying function has a large slope, then the error could be large even though the iniu distance is sall. The underlying function is generally not known, and therefore its gradient could not be coputed. Pukrittayakaee and Hagan (00) found that an approxiation of the gradient could be used if the iniu distance coputation is odified to include the distance between the network output and the target. The odified vector is [p t], where p is the input vector and t is the network output vector.!!!!!!!!! " # = N N T T T t a p p t a p p t a p p d (6.44) where is the greater than zero, and N is the total nuber of training points.

109 0 CHAPTER 7 7 Ab Initio Molecular Dynaics (AIMD) 7. Approach This chapter introduces an integrated approach for obtaining ab initio quantu echanical potential energy surfaces and force fields for use in olecular dynaics siulations. The ethod involves following steps:. Perfor MD siulations using an epirical potential which can reasonably odel the interatoic interaction of the aterial. Store the configurations of atos coprising of nuber of atos within the cut-off radius of a particular ato and coordinates of respective atos during the siulation. These configurations ake up the initial set of configurations and serve as the starting point.. Perfor electronic structure calculations of energies and force fields for the configurations stored in step configurations using electronic structure progra such as Gaussian Train a ultilayer neural network to fit the scaled potential energy (obtained by shifting the zero of potential energy) and force fields for the configurations. 97

110 4. Perfor MD siulation using the output fro the neural network for energy and force. Observe the energy conservation and store ore configurations during the siulation. 5. Copare the configurations stored while using the neural network in step 4 with those stored in step, and using novelty detection technique find the outlier configurations i.e. the configurations that are different fro those in the training set, and for a new data base of configurations by augenting the original data set with these new outlier configurations. 6. Perfor electronic structure calculations on the new set of configurations using electronic structure progra such as Gaussian Retrain the neural network for this new data base of configurations. 8. Repeat steps 4-7 until the configurations during the MD siulation are no longer different fro those in the data base for which the neural network is trained. This would iply that the neural network is properly trained over the range of configuration space involved during the siulation. 7. Choice of the coordinate syste Cartesian coordinates provide an efficient representation of the olecular geoetry for the coputer, and have the advantage of including actual spatial orientation of the olecule. The position of an ato is specified using its X, Y, and Z coordinates. Hence to specify a olecular geoetry in Cartesian coordinates, one need to specify 3N coordinates, where N is the nuber of atos in the olecular syste. The use of internal coordinates allows one to specify the geoetry with respect to the body-fixed frae so 98

111 that translational and rotational degrees of freedo of the syste can be ignored and the geoetry is specified using 3N-6 internal coordinates of the olecule. 7.. Internal coordinates Internal coordinates are generally the ost efficient way to describe the potential energy as a function of atoic coordinates. However since the internal coordinates are coupled, it is convenient to solve the equations of otion during MD siulation in the Cartesian coordinate syste. Consequently, it becoes necessary to convert a set of Cartesian coordinates to corresponding internal coordinates. However there is no unique set of Cartesian coordinates for a given set of internal coordinates due to translational and rotational degrees of freedo involved and due to nonlinear dependence of the internal coordinates upon the Cartesian coordinates. For converting internal coordinates to Cartesian coordinates, soe constraints have to be iposed such as fixing the position of the center of ass at the origin for a given configuration and aligning the coordinate axis along a specific ato (first neighbor represented as ato in Gaussian 98) Internal coordinates are specified using bond distances, bond angles and dihedral (torsional) angles. Fig 7. shows variables involved in internal coordinates. The dihedral angle is the angle between two planes passing through atos i, j, l and j, k, l respectively. It is easured as the angle between the vector noral to these planes. Unlike the bond angle which varies fro 0-360, dihedral angle varies fro

112 Fig 7. Internal coordinates a) bond distance, b) bond angle, and c) dihedral angle. Dihedral angle is calculated as: cos ijkl = (ê ij X ê jk ) (ê jk X ê kl ) / (sin ijk sin jkl ), (7.) where ê ij is the unit vector pointing fro ato i to j. Alternatively, if the equations of planes fored by atos i, j, k and j, k, l are expressed as: a x + b y + cz + d. (7.) Then the noral vector is given by n = (a, b, c ), and the dihedral angle is defined as: cos = n n = a a a + b + b b + c c (7.3) + c a + b + c The dihedral angle is either positive or negative depending on the relative rotation of the configuration with respect to an observer. Only the absolute value of the dihedral angle can be obtained using Eqn. (7.) or Eqn.(7.3), and soe sign convention has to be followed to deterine the sign of the dihedral angle. Dihedral angles can be viewed using Newan projections. It ay be noted that Newan projections are drawings used to represent different positions of parts of olecules with respect to each other in space. The structure is viewed along the bond between two adjacent atos and the bonds to other 00

113 atos in the configuration are drawn as projections in the plane of the paper. When a dihedral angle is viewed fro front to rear, if the ato is clockwise fro front, the dihedral angle is positive. If the rear ato is anticlockwise the dihedral angle is negative. Fig 7. shows the sign convention for the dihedral angle. A B B A Positive dihedral angle Negative dihedral angle Fig 7. Sign convention for dihedral angle 7.3 Input vector to the neural network A ultilayer neural network is used to fit the energy or forces fro the ab initio calculations. The input vector to the neural network should consist of variables that define the geoetry of a given configuration uniquely in body fixed frae of reference. Fig 7.3 Configuration of 5 silicon atos 0

114 Fig 7.3 shows a typical configuration of 5 silicon atos where Si is the central ato and atos Si, Si3, Si4 and Si5 are bonded to the Si ato. The dotted line for bond Si-Si3 shows that the ato Si3 is projecting inside the plane of the paper, while solid lines for bonds Si-Si4 and Si-Si5 show that the atos Si4 and Si5 are projecting out of the plane of the paper, while ato Si and Si are in the plane of the paper. Now the input vector to the neural network for the configuration shown in Fig 7.3 could be constructed in two ways. One approach would be to for the input vector consisting of all the bond distances. For exaple the input vector could be fored as: in = [r ; r 3 ; r 4 ; r 5 ; r 3 ; r 4 ; r 5 ; r 34 ; r 35 ; r 45 ], where r, r 3, r 4, and r 5 are bond distances between atos Si and Si, Si3, Si4 and Si5 respectively. Such a vector would uniquely specify the geoetry of a given configuration. In this case, the nuber of variables involved would be n (n-)/, where n is the size of the cluster (for a cluster of 5 silicon atos, n=5). Other approach would be to for the input vector consisting of internal coordinates. This forulation would require 3n -6 variables as opposed to n (n-)/ variables as in the case of input vector consisting of all bond distances. The configuration in Fig 7.3 can be written in ters of internal coordinates using 4- bond distances, 3- bond angles and - dihedral angles as: in = [, r, r, r,,,, ] r, , where r, r 3, r 4, r 5 are bond distances between atos Si and Si, Si3, Si4 and Si5, respectively, and 3, 4, 5 are bond angles that atos Si3, Si4 and Si5 ake with bond -, and 43, 54 are the dihedral angles. While easuring the bond angles and dihedral angles bond - (Si-Si) is used as a reference line. So that the input vector 54 0

115 so constructed is consistent with the convention followed in Gaussian 98, which places the Z-axis along bond - while using internal coordinates. It should be noted that the length of the input vector fored using all bond distances varies as the square of n i.e. the cluster size, whereas the length of the input vector fored using internal coordinates varies linearly (three ties) with the cluster size. As a result, for the cluster size of five or greater than five, the input vector fored using the internal coordinates would involve less nuber of variables copared to the input vector fored consisting of all bond distances. The use of internal coordinates reduces the diension (length) of the input vector used for training the neural network. Consequently, the input space is less sparse. This could result in faster convergence during training with less nuber of training configurations. This also reduces the overall coplexity involved in apping the input variables to the output variables. 7.4 Sorting and ordering the input vector for neural network training Consider a function of two variables: f(x,y) = x +y, which is a paraboloid. Fig 7.4 shows the contour plot of the function. Since the function is parabolic so the contours are circles, and each circle represents a constant function value curve, i.e. the points on a contour represent a set of [x,y] values which result in the sae value for f. It should be noted that f (,4) = f (4,), since points [,4] and [4,] are on the sae contour corresponding to f = 0. Now if a neural network is to be trained for this function of two variables (x,y) then the input vector corresponding to f = 0 could be specified as either #! " 4 or # 4!. " The difference between these two input vectors is the different ordering of the variables x 03

and y. It is possible to reduce the configuration space required to be trained if the syetry involved in specifying the input vector is taken into consideration.

116 and y. It is possible to reduce the configuration space required to be trained if the syetry involved in specifying the input vector is taken into consideration. The input could be ordered so that the first eleent is x and the second eleent is y (or vice versa). So instead of training the network with two inputs differing only in their ordering and having the sae output, the network could be trained for only one ordered input vector. Fig 7.4 (b) shows the contour plot of the function, the diagonal line separates the two input vectors having the sae function value but differing only in their order of representation. (a) (b) Fig 7.4 Contour plot of a function f(x,y) = x +y As shown in Fig 7.4 (b) either the upper or the lower region about the diagonal need to be considered for training. Thus for a two variable function, by ordering the input vector to the network reduces the configuration space required for training by a factor of half. Such an approach proves to be ore efficient in a situation where the input vector 04

117 diension is large and the nuber of data points is liited. This results in sparse data points in high diension aking the network training difficult Procedure for sorting and ordering the input vector for AIMD It is iportant to take the syetry of the configuration into account. For the electronic structure progra, such as Gaussian 98 the energy and force fields for a given Si 5 cluster are unaltered if two or three silicon atos are exchanged. However, the neural network training algorith does not take into account the syetry involved in the input vector. Therefore, the neural network ust be trained to recognize this syetry or a different procedure for constructing the input vector should be adopted that akes this unnecessary in order for the neural network to recognize different ordering of atos within the sae configuration. Consider a configuration as shown in Fig 7.5. Both (a) and (b) show the sae configuration but with different ordering of atos Si and Si3. (a) (b) Fig 7.5 Effect of ordering of atos in a configuration 05

118 The configurations shown in Fig 7.5 (a) and (b) both have the sae energy and forces along the bonds and either of the two could be used for ab initio calculations. But the two configurations will have different input vectors for neural network training because bond used as the reference line for easuring the angles is different in the two cases, - in case (a) and -3 in case (b). The input vectors for these two configurations could be written as: in (a) = [, r, r, r,,,, ] r, , 54 in (b) = [, r, r, r,,,, ] r , 534 For any non-equilibriu geoetry, different angles are involved in the input vector. So, for the neural network the configurations in Fig 7.5 (a) and (b) are two different inputs. For proper generalization of the fit, the neural network has to be trained for both the input vectors having the sae energy as the output. For a given configuration there are (n-)! syetries (perutations) possible the with sae energy and force fields, where n is the size of the cluster. As a result the nuber of configurations required for training the neural network increases exponentially as the nuber of atos within a cluster increases and it becoes unfeasible to consider all the syetries. For exaple: if there are N different configurations for which ab initio calculations are perfored then for the neural network training there would be N(n-)! data points considering all syetries. For a cluster of Si 5 atos (n-)! is 4, hence with 0,000 different configurations fro Gaussian 98 there would be 40,000 data points for neural network training. Considering all possible perutations of the sae configuration proves to be an inefficient approach for the neural network training, since it not only increases the 06

119 nuber of configurations required for training by a factor of (N-)! but also increases the configuration space and coplexity of the function to be trained. To circuvent this proble it becoes necessary to devise a strategy for training the neural network without considering the perutations. One way to achieve this is to order the input vector which obviates the need to consider all perutations without affecting the generalization ability of the neural network. Consider a configuration as shown in Fig 7.3. First calculate all bond distances with the central ato Si, i.e. r-, r-3, r-4 and r-5, sort these bond distances and order the atos Si, Si3, Si4 and Si5 and then calculate the angles with bond - as the reference line ( corresponds to ato after sorting and ordering). The ordering of the input vector circuvents the need to consider all perutations and also reduces the configuration space of the input variables required for training and hence the overall coplexity of the function involved. As a result tie required for training the neural network is reduced. Also a sipler neural network architecture (having less nuber of neurons) could be ipleented and the accuracy of the fit could be iproved for a given nuber of neurons. The sae procedure should be followed during the MD/MC siulation to construct the input vector, in order to calculate the energy and forces using the trained network. This procedure results in a slight increase in the coputational tie during the siulation. But the advantages offered by sorting and ordering the input vector offset this slight increase in the coputational tie. 7.5 Calculation of forces during the MD siulation Molecular dynaics siulation involves nuerical integration of the classical Newton s equations of otion for a syste of interacting atos over a period of tie. According to Newton s second law: 07

120 F i = i a i (7.4) where F i is the force on ato i, and i and a i are the ass and acceleration of ato i. The force is the negative gradient of the potential energy with respect to the position of an ato. r r r r F = V,,... ), (7.5) i i ( N where V is the potential energy function. r i is the position vector of ato i, x i, y i and z i, are the Cartesian coordinates of ato i. r = x iˆ + y ˆj + z kˆ, (7.6) i i i i iˆ ˆ i = + j + kˆ. x y z i i i (7.7) The force on an ato could be coputed in two ways. One approach would be to first train a neural network for the forces and use this network in the siulation. Now for a cluster of n atos there would be at least 3n-6 variables required to define the geoetry uniquely. If a neural network is to be trained for the forces then such a network would have 3n-6 input variables (bond distances and angles) and also 3n-6 output variables (forces along the bond distances and angles). It would be a very deanding task to train a single network to perfor this, since the functional dependence of angular forces on the input variables ay be very different fro the functional dependence of forces along the bonds. It would be like trying to fit different functional relationships for a given set of input variables. Although such a network could be trained with a large nuber of neurons and consequently ore training tie, a better approach would be to train the network for the potential energy of the olecular cluster and then during the siulation differentiate 08

121 this network with respect to the input variables to get the forces. For a specified accuracy such a network ay require less nuber of neurons than a network which tries to directly fit all the derivatives. This reduces the training tie but on the other hand during the siulation the output fro the neural network has to be differentiated to calculate the forces which introduce extra calculations during the siulation. The approach involving the neural network differentiation would be feasible only if the tie required for the differentiation of a possibly a saller network is lesser than the tie required to copute directly the derivatives fro a larger network with coparable accuracy. Since the neural network involves analytical differentiable transfer functions, such as sigoid functions for apping, it could be differentiated analytically without introducing any nuerical errors. For the ultilayer network, the network output is given as: a + = f + ( W + a + b + ) for = 0,,..., M, (7.8) where the superscript represents the layer nuber. M is the nuber of layers of the network, and input to the first layer is 0 a = p The sensitivity is defined as: s M da = dn M M = T & M M + M + ( ), F n W s (7.9) where F & M ( n ) is the Jacobian atrix F& ( n ) = # f& ( n!! 0!.!!.!!.! " 0 ) 0... f& ( n... 0 ) f& ( n S ) (7.0) For the output layer M, the sensitivity s M is 09

122 The sensitivity of hidden layers is coputed as: s = df dn T + + ( W s ) for =,,..., M, (7.) where layers. df dn is the derivative of the transfer function at layer, and M is the nuber of The derivative with respect to the network input is coputed as: a n T = W s. (7.) 7.6 Sapling technique For a typical configuration of 5 silicon atos as shown in Fig 7.3, the input vector using internal coordinates is specified as: [ ; r ; r ; r ; ; ; ; ] r ; The neural network is trained for energy coputed fro Gaussian 98 as a function of nine variables in internal coordinates. Although it is possible to copute energies and forces at MP4 (SDQ) and DFT level for thousands of configurations, this nuber of configurations is still sall copared to the totality of the configuration space available for the syste. If reasonable ranges are taken for each coordinate and each range divided into ten equal segents, then the total nuber of possible configurations would be 0 9. This nuber of configurations is ipractical considering the present coputational capabilities. However because of bonding restrictions present in the syste not all of the configuration space is of interest in the dynaical studies. For exaple, in the siulation 54 0

123 of bulk silicon aterial, configurations such as Si-Si-Si-Si-Si i.e. five silicon atos in a linear chain are of no interest. In any syste, the nuber of iportant configurations fors a sall subset of the total configuration space. Therefore, the key is to saple only the iportant region of the total configuration space. Since an epirical potential is used during the first step to store the configurations, this set of configurations need not necessarily cover the entire configuration space of the ab initio potential energy. This epirical potential need not be highly accurate but should provide a reasonable representation of the interatoic interaction of the aterial. The configurations that are generated in an enseble of trajectories using an epirical potential coprise the subset of configuration space that has to be sapled. The procedure that has been eployed attepts to saple the entire configuration space in an iterative fashion. A set of trajectories using this epirical potential is coputed. Next the energy and forces for these configurations are coputed using Gaussian 98. This data akes up the first approxiation for the ab initio potential energy and force field. Then a neural network is trained to the ab initio database with the input vector consisting of internal coordinates and the output being the energy or force coponents. Now a second set of MD trajectories is coputed using this neural network force field rather than the original epirical potential. During these calculations atoic configurations are stored and their energies and forces are coputed using Gaussian 98. After several iterations neural network is expected to converge to ab initio potential energy and force field. Given a sufficient nuber of neurons, a neural network can be trained to approxiate any arbitrary function (Cybenko, 989). However, the perforance of the

124 trained neural network will be dependent upon the data set that was provided during the training period. A training algorith iniizes the errors between the targets and the neural network outputs for the training data set. But this does not guarantee the neural network perforance for a data that was not present in the training set, usually called as testing data. In the case of neural network fitting, for a ultidiensional variable function, it becoes difficult to decide which input vector lies outside the range of the training set. Also when the diension of the input vector is large then the distinction between interpolation and extrapolation becoes ore abiguous. Therefore, differentiating the boundaries between noral and abnoral (outlier) data is uch ore coplicated, aking it difficult to decide whether a specific input should be accepted or rejected. 7.7 Novelty detection Novelty Detection is a technique that is used to identify abnoral data points. Thus, the novelty detection algorith should be capable of identifying the inputs that fall outside the range of the training data. Soe of the novelty detection techniques (Pukrittayakaee and Hagan, 00) are: 5. Miniu distance coputation 6. Neural tree algorith (Martinez, 998) 7. Gaussian kernel estiator (Bishop, 994) 8. Associative ultilayer perceptron (Frosini, 996) In the context of AIMD, the novelty sapling algoriths based on the current density of configuration points in the data base guide the selection of configurations to be included

125 to for a new data base. Regions with a low density of configurations are preferentially included in the saple. Let q i be an input vector corresponding to the i th configuration, in the current data base. By coputing trajectories using the neural network output for the force field a new configuration q n is generated. Qualitatively this configuration should be included in the data base if q n lies in a region of space where the density of points in the data base is low. Miniu distance coputation is used for the novelty detection. It is based on the - nor difference between vector q n and vector q i : q i n T / [ q q ) ( q q )] q =. ( i n i n (7.3) A vector d is now defined whose eleents are all the distances coputed using Eqn.(7.3) for each of the N configuration points in the data base. The iniu distance separating point q n fro other points in the data base is the iniu eleent of d denoted as d =in(d). The ean separation distance <d> is the average value of the eleents of d. The configuration point q n will be added to the data base with a high probability if d or <d> is large, but with low probability if both d and <d> are sall. A selection algorith based solely upon the agnitude of d has a drawback that it will not detect the points that are needed in the database, in the region where the configuration density is not low. This is the case when the function is sharply changing. Hence, in the regions where the gradient of the function is large, any points will be required to obtain an accurate neural network fit than in the case with saller gradient, even though the configuration density is not low. But the underlying function is unknown. Therefore it is ipossible to calculate the gradient at an arbitrary configuration point. Pukrittayakaee and Hagan (00) have found that if the iniu distance 3

126 between a set of odified configuration vectors is used in place of d, then the effect of the function gradient can be incorporated into the selection algorith. A odified vector is created as: i = [q i O i ], (7.4) where O i is the output fro the neural network of the configuration point q i. When the iniu distance between i and n is used in place of d, the effect of function gradient is incorporated. This is because, if the gradient between points q i and q n is large, then the difference between corresponding neural network outputs O i and O n will be large, aking the agnitude of large, and hence the point q n will be identified as a novelty point. 4

127 3 4 5 CHAPTER 8 8 Results and Disscussion In the present study the approach was used to develop potential energy hypersurface for a cluster of five silicon atos. Ab initio calculations were perfored using Gaussian-98 electronic structure progra using density functional theory. A feedforward ultilayer neural network was trained using Matlab. The accuracy of the trained neural network was iteratively iproved using odified novelty detection technique. The equilibriu configuration of silicon is tetrahedral in which every silicon ato is bonded to four other silicon atos in a tetrahedral arrangeent as shown in Fig 8.. Fig 8. Silicon cluster To odel this syste a cluster of at least five silicon atos should be considered including the central ato and four other atos bonded to it. MD siulations of achining of silicon workpiece were perfored. Tersoff potential (Tersoff, 989) was 5

128 eployed during the initial step of the iterative process. Very high energies are present during the achining process which results in atoic configurations far off fro the equilibriu tetrahedral structure especially in the chip and near the tool because of the subsurface deforation. This increases the size of subset configuration space that is sapled. MD siulations of achining were perfored with +5 rake angle, 0 clearance angle, Å depth of cut at a cutting speed of 49. s -. The total nuber of atos in the workpiece was,675. Throughout the MD siulations, the configurations of the five-ato silicon clusters that are present predoinantly in the chip in front of the tool in the shear zone and within a few unit cell distances in the workpiece beneath the tool are stored 8. Ab initio calculations The energies and force fields at the central ato are coputed for 8,000 configurations using Gaussian 98 electronic structure progra for density functional theory (DFT) calculations with 6-3G** basis set and B3LYP procedure for incorporating the correlation. The input was specified in internal coordinates i.e. Z-atrix forat. It is iportant to take syetry into account, since erely switching the ordering of atos does not change the resulting energy and force fields fro ab initio calculations. The neural network should also be trained to recognize different possible ordering of the sae configuration involved. One option to achieve this is to train the neural network considering all possible perutations of a configuration. For a cluster of n atos there are (n-)! possible perutations. So with 8,000 five atos silicon clusters this would require. 8,000 (5-)! i.e. 43,000, data points to be considered for the neural 6

129 network training, which becoes unfeasible for neural network training. An alternative is to order the input vector which obviates the need to consider all perutations without affecting the generalization ability of the neural network. The Z-atrix input describing the Si 5 clusters coprise of four Si-Si bond distances, three Si-Si-Si- angle, and two dihedral angles. Before each calculation, the four bond distances were listed in increasing order of their deviation fro the Si-Si equilibriu bond distance. This listing deterines the Z-atrix input for the bond angles and dihedral angles. Bond Si-Si was considered as reference for easuring the angles. This procedure obviates the need to train the neural network to recognize syetry and siplifies the training. Fig 8. shows a typical configuration of five silicon atos. The variables involved in Z-atrix input are r, r 3, r 4, r 5, 3, 4, 5, 43 and 54, where r, r 3, r 4, r 5 are the bond distances of silicon ato # with atos, 3, 4 and 5 respectively, 3, 4, 5 are angles of that bond -3, -4 and -5 ake with the bond - and 43 and 54 are the two dihedral angles. When the trained neural network is utilized for coputing energy and force fields, the sae ordering of the input vector was used. The database of configurations obtained fro the achining siulations was augented with additional 0,000 five-ato silicon configurations near the equilibriu. These configurations were obtained corresponding to a teperature of 300 K in the silicon workpiece and following the vibrational otions of the lattice using olecular dynaics with Tersoff potential. The configurations were stored at equally spaced tie intervals during the calculations. The energies and force fields for these configurations were also coputed using Gaussian 98 for density functional theory (DFT) calculations with 6-3G** basis set and B3LYP procedure for incorporating the correlation. 7

130 8. Neural network training Next step is to fit the neural network to the ab initio energy. The forces can be calculated during the MD siulation by differentiating the neural network output i.e. the energy with respect to the inputs i.e. the internal coordinates. The zero of the energy is shifted so that the energy of five infinitely separated silicon atos corresponds to zero energy. Consequently, the energy of a cluster of five silicon atos would be negative, which is the convention followed for epirical potential as well. The inputs i.e. the internal coordinates and outputs i.e. the energy are scaled so that the range for each variable in the entire database is [-, +], using Eqn.(8.) ( p in p) p n =, (ax p in p) (8.) where p is the variable to be scaled, inp and axp are the iniu and axiu values of each variable in the input and output vectors for the entire database consisting all configurations. p n is the scaled value corresponding to p. The scaled database is divided into three sets: training, validation and testing set. Approxiately 0% of the data points were used for the validation set, 0% for the testing set and the rest for the training set. A ultilayer feed forward neural network was used to fit the data. The nuber of neurons in the input and output layers is deterined by the nuber of input and output variables, respectively. 8

131 The neural network specifications are listed in Table 8.. Table 8. Neural network specification for silicon clusters Nuber of layers: Nuber of neurons: Transfer function Training algorith, (one hidden layer and one output layer) 9 for the input layer, corresponding to the nuber of input variables. 45 for the first hidden layer. for the output layer, corresponding to the nuber of output variables. Hyperbolic tangent function in the hidden layer. Linear function in the output layer. Levenberg-Marquardt along with Bayesian regularization. Designing a network involves deciding upon the nuber of neurons in the hidden layer. The network should have enough paraeters to learn the underlining function. On the other hand, if there are too any paraeters available for fitting a relatively siple function or if the nuber of data points is far less than the nuber of paraeters, then the network could overfit the data i.e. it would try to fit every data point very closely while not generalizing well the configuration space. Generally, the underlined function is unknown which akes it difficult to decide the nuber of neurons. To avoid overfitting it is essential to eploy an early stopping procedure (Sarle, 995). In this ethod, the root ean square deviation of the neural network output fro the data in the validation set is coputed after each iteration during training. Initially, the root ean square error for both the training and validation sets decrease because the accuracy of the fit is iproving. At soe point during training overfitting the data can start. This would iprove the apparent accuracy for the training set, but increase the accuracy for the validation and testing sets since these points are not included in the training set. For optiu perforance fro the specific neural network architecture, the training should be continued until the error for the validation set is a iniu value. Along with early 9

132 stopping, regularization is also used which penalizes the network odel with high curvature, since high curvature norally results while overfitting the data. Bayesian regularization provides an approach to deterine the optial regularization paraeter in an autoated fashion. In this fraework the weights and biases are assued to be rando variables with a specified distribution. Fig 8. Coparison of neural network output with ab initio energy for the training set 0

133 Fig 8.3 Coparison of neural network output with ab initio energy for the validation set Fig 8.4 Coparison of neural network output with ab initio energy for the testing set

134 Fig 8. to Fig 8.4 show the plots of neural network output for the training, validation and testing sets, respectively. If the neural network fit were perfect then all points would fall along the 45 line. The coputed root ean square deviation of the neural network fit fro the ab initio energy values was 0.36 KJ ol -, which is coparable to the cheical accuracy involved in the experiental easureents. 8.3 Iterating for the neural network convergence The iniu distance of a configuration in the initial set (consisting of configurations stored for Tersoff potential) fro all other configurations in the sae set is calculated. Fig 8.5 shows the distribution of iniu distances N( ) for the initial set of Si 5 configurations fro Tersoff potential. The distribution is close to Gaussian with the ost likely noralized iniu distance being about Fig 8.5 Distribution of iniu distances N( ) for the initial set of Si 5 configurations using Tersoff potential

135 Next step is to use the trained neural network in MD siulation for energy and force calculations. While perforing the MD siulations using the neural network, the novelty detection is also eployed to obtain additional configurations to iteratively iprove the potential energy hypersurface. Now the iniu distances of configurations in the new set (consisting of configurations stored during the MD siulation using neural network) are calculated fro the configurations in the initial set (using Tersoff potential) and the distribution is plotted in the histogra in Fig 8.6. The distribution has a peak close to 0.5 Fig 8.6 Distribution of iniu distances N( ) of the configurations in first iteration fro initial set configurations. Fig 8.7 shows a coparison of the distribution of iniu distances N( ) of the configurations fro first iteration with the distribution of configurations fro Tersoff potential. The dashed lined distribution corresponds to iniu distance of the initial 3

136 configurations and the solid lined distribution corresponds to iniu distance of configurations fro the first iteration. There is very little overlap between the two distributions indicating that the configurations occurring during the MD siulation while using the neural network trained ab initio potential energy hypersurface are significantly different fro those in Tersoff potential. This indicates that the new configurations lie in a region of configuration space not adequately sapled by the initial data set using Tersoff potential. Fig 8.7 Coparison of distribution of iniu distances N( ) of the configurations in first iteration fro initial configurations with the distribution of configurations fro Tersoff potential. A new set is fored by concatenating the initial configurations and configurations during the first iteration and the iniu distance distribution is recoputed for the concatenated set. Fig (8.8) shows the distribution of recoputed iniu distances for the concatenated set. The spread of the histogra in Fig (8.8) is saller that in Fig 8.7 4

137 because now the neural network is trained for outlier configurations as well along with initial set of configurations and as a result corresponding output is close to the target values. Also the configurations in the two sets could be overlapping, which reduces the spread of iniu distances for the concatenated data set. Fig (8.8) Distribution of recoputed iniu distances of concatenated set of configurations in the initial and first iteration. Now for the selection process based on iniu distance criterion, noralized probability value should be used. For this purpose, Parzen s distribution for ultivariate kernel is used (Specht, 990). In the case of Gaussian kernel, the ultivariate estiate is expressed as: F A (X) = p / ( ) $ p # ( X X )'( X X Ai exp! i= $ " Ai ), (8.) = Total nuber of configurations X Ai = iniu distance for the i th configuration. 5

138 = soothing paraeter p= diensionality of the data, since in this case there is only one nuber corresponding to the iniu distance so p=. A sall value of causes the estiated density function to have distinct odes corresponding to the locations in the saples. A larger value of produces a greater degree of interpolation between points. Fig 8.9 shows a plot of Parzen s distribution for the initial set of configurations fro the Tersoff potential using = 0.0. Fig 8.0 shows a plot of Parzen s distribution for configurations stored while running MD siulation using neural network. Fig 8.9 Parzen s distribution of iniu distances for the initial set of configurations fro the Tersoff potential 6

139 Fig 8.0 Parzen s distribution of iniu distances for the configurations stored while using neural network trained for ab initio energy during MD siulation. As new configuration points are generated during the trajectories coputed using the neural network force field, they are added to the data base if: P( ) T P( ) ax, P( 8 or, if % ( T ) P( 8 ) ) ax 9 and. (8.3) T is an epirically deterined acceptance threshold n the range (0 T ) and is a rando nuber whose distribution is unifor on the interval [0,]. Now, by exaining Fig 8.9 it can be seen that for a given probability there are ore than one values, as a result when the iniu distance between the configurations is very sall and close to zero (which is the case if the two configurations are alost identical), then the corresponding probability is sall and the configuration will be 7

140 accepted, which is not required since there is already at least one configuration in the set which is close to it. To avoid this in the second condition in equation (8.3), is used so that a configuration with low probability is accepted only if its iniu distance is greater than soe threshold value. Fig 8. Distribution of iniu distances of the configurations during second iteration fro configurations during the first iteration Fig 8. shows the distribution of iniu distances of the configurations during second iteration fro configurations during the first iteration. A sharp peak close to zero indicates that the new configurations are not very different or very siilar to the configurations for which the neural network is trained, and the second peak is close to Coparing Fig 8. with Fig 8.5 where the peak was close to 0.095, it can be concluded that the neural network training has converged over the configuration space 8

141 involved during the siulation. Fig 8. shows Parzen s distribution of iniu distances for the configurations during second iteration. Fig 8. Parzen s distribution of the iniu distances for the concatenated set of configurations in the initial and first iteration. 8.4 Neural network convergence using different epirical potentials Tersoff potential is used to odel silicon workpiece during the initial step. Configurations of atos are stored during the siulations, which are used to perfor ab initio calculations using Gaussian 98. A neural network is then fitted to the ab initio energy and forces for use in MD or MC siulation. To copare the effect of using different epirical potentials as a starting point to store the configurations for Gaussian 98, two potentials proposed by Tersoff (988, 989) were selected. The paraeters for two potentials are listed in Table 8.. 9

142 Table 8. Tersoff potential paraeters Tersoff (988) Tersoff (988) A (ev) x x 0 3 B (ev) x x 0 (Å - ) (Å - ) x x 0-6 n.956 x x 0 - c x 0 5 d x 0 h x 0-3 (Å - ) R (Å) D (Å) MD siulations were perfored using the two potentials and the configurations were stored during the siulation. To copare the configurations fro the two sets, iniu distances of a configuration fro one set with all the configurations in the second set are coputed. Fig 8.3 (a). shows a histogra of iniu distance of all configurations stored using odified Tersoff potential. Fig 8.3 (b). shows a histogra of iniu distance of configurations stored using Tersoff potential with all the configurations stored using odified Tersoff potential. Fig 8.3 (a) Distribution of iniu distances of all configurations stored using odified Tersoff potential. (b) Distribution of iniu distances of a configuration using Tersoff potential fro configurations using odified Tersoff potential. 30

143 Larger spread of the histogra in Fig 8.3 (b) than in Fig 8.3 (a) indicates that the configurations fro the two sets are very different fro another. Fig 8.4. shows a plot of the potential energy for the sae set of configuration using different potential energy functions and corresponding paraeters. If the values of potential energy were the sae for a given configuration then the points would lie along a 45 0 line. A rando distribution of points in Fig 8.4. shows that the two potentials result in different configurations. Fig 8.4 coparison of Tersoff and odified Tersoff potential energies (ev) for the sae set of configurations 3

144 Next step is to copute ab initio energies and forces using Gaussian 98 (using Density Functional Theory) for these two sets of configurations and fit a neural network. Fig 8.5. shows a plot of neural network output for configurations trained using two sets. Fig 8.5 Coparison of neural network output energy (ev) for configurations fro Tersoff and odified Tersoff potentials 8.5 A ethod to obtain a conservative force field Ab initio calculations were perfored for clusters of five silicon atos and a neural network was trained for either the energy of the cluster or forces on the central ato in the cluster. During the MD or MC siulation this neural network is used to calculate the energy and forces on all atos in the syste. Such siulations are perfored on the bulk silicon, whereas the forces on atos are calculated using neural network trained for energy or forces in clusters of five silicon atos. So in effect the entire work-piece is considered to be ade up of separate five atos clusters, and the corresponding force field represents an isolated cluster of five silicon atos and not five 3

145 silicon atos in the bulk in which case the force field is affected not only by first neighbors but also second neighbors and so on. If during the MD siulation, only the force on the central ato is coputed for every cluster in the workpiece using the neural network then the resulting force field will not be conservative, i.e. the su of the forces on all atos in X, Y and Z directions does not add to zero. To get a conservative force field it is necessary to copute the forces on the end atos (atos bonded to the central ato within a five atos cluster) in addition to the central ato. For this purpose all workpiece atos are grouped into two groups viz. central and end atos. Central ato End ato Fig 8.6 Central and end atos grouping During the MD siulation the force is calculated by differentiating the During the MD siulation the force is calculated by differentiating the neural network with respect to the coordinates of the central and end atos. For one set of central and end atos this force field is denoted as F, for other set obtained by switching 33

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith