Quantum Monte Carlo for Transition Metal Systems: Method Developments and Applications. Lucas K. Wagner

Size: px

Start display at page:

Download "Quantum Monte Carlo for Transition Metal Systems: Method Developments and Applications. Lucas K. Wagner"

Spencer Gordon
5 years ago
Views:

1 ABSTRACT WAGNER, LUCAS K. Quantum Monte Carlo for Transition Metal Systems: Method Developments and Applications. (Under the direction of Professor Lubos Mitas). Quantum Monte Carlo (QMC) is a powerful computational tool to study correlated systems of electrons, allowing us to explicitly treat many-body interactions with favorable scaling in the number of particles. It has been regarded as a benchmark tool for condensed matter systems containing elements from the first and second row of the periodic table. It holds particular promise for the more complicated transition metals, because QMC treats the correlations between electrons explicitly, and has a computational cost that scales well with the system size. We have developed a QMC framework that is capable of simulating systems containing many electrons efficiently, through advanced algorithms and parallel operation. This framework includes a QMC program using state of the art methods that make many interesting quantities available. We apply a method of finding the minimum and other properties of the potential energy surface in the face of stochastic noise using Bayesian inference and the total energy. We apply these developments to several transition metal systems, including the first five transition metal monoxide molecules and two interesting ABO 3 perovskite solids: BaTiO 3 and BiFeO 3. Where experiment is available, QMC is generally in agreement with a few exceptions that are discussed. In the case where experiment is unavailable, it makes predictions that can help us understand somewhat ambiguous experimental results.

2 Quantum Monte Carlo for Transition Metal Systems: Method Developments and Applications by Lucas K. Wagner A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Physics Raleigh 2006 Approved By: Dr. M. Buongiorno-Nardelli Dr. D. Lee Dr. L. Mitas Chair of Advisory Committee Dr. K. Ito

3 To Marta and my parents. A Marta e ai miei genitori. ii

4 iii Biography I was born September 21, 1980 in the city of Danville, in the southwestern part of Virginia to James and Deborah Wagner. I grew up near the town of Chatham, about 10 miles north of Danville with a population of around 1,300 people. I spent kindergarten and first grade at Chatham Elementary School, moved to Climax Elem for second through fifth, Central Middle for sixth and seventh, and finished primary schooling in Chatham High School in June of I entered North Carolina State University as a major in physics. I worked one semester for the WebAssign project, and immediately sought out a place to work in a real lab. I approached Dr. Jacqueline Krim for a place as an undergraduate researcher and she agreed. I started in my second semester at NCSU. The first day, I leveled all the desks in her office and painted her filing cabinet. I spent some time helping her group move in and set up the lab, since she had just moved from Northeastern University. After a while, I did perform some research in her lab, none of which was probably worthy of publishing, but I very much enjoyed it. I worked on various projects, finding a quartz crystal that would still oscillate at 500 Celsius, building a ultra-high vacuum vapor deposition chamber, and ripping postdoc Brian Mason s carefully constructed superconductivity-dependent friction experiment to pieces. In my second year of university, I started to get interested in mathematics. I ended up adding a second major of applied mathematics in that year. I was very interested in math, used with computers, to solve problems that would otherwise be intractable. This is the topic of this dissertation, so the thought has stayed with me. The thought is something like this: the computer Deep Blue was able to beat one of the best chess players in the world. Its creators are absolutely not able to beat him, and probably would not be able to come even close. I imagine that to them (and me), such a thing is just about as hard as, say, understanding the complicated motion of many many objects all interacting with each other. Nonetheless, with substantial effort, it seems to be possible to use a computer to filter a really complicated problem down to something we can handle. This dissertation is about a tiny step in that direction. In the beginning of my third year in the undergraduate curriculum, I left Dr. Krim s lab to work with Dr. Mitas. We worked on silicon nanocrystals, which resulted in a few papers and an award from the college for the best undergraduate research. I completed my double major in physics and applied mathematics, and Dr. Mitas

5 iv convinced me to stay on with him for my PhD work. In my second year as a doctoral candidate, I applied for and received the National Science Foundation Graduate Research Fellowship, which I have appreciated greatly.

6 v Acknowledgements First, I would like to thank Dr. Lubos Mitas for bringing me through the six long years from the middle of my undergraduate education to the end of the PhD program. I have learned a lot from him in that time. Also, thanks for the trips to Italy and France, which altered the course of my life tremendously. Thanks to Jeff Grossman for working with me throughout my graduate career, as well as my other colleagues who I ve worked with within the Mitas group: Michal Bajdich, Prasenjit Sen, Jindrich Kolerenc, Gabriel Drobny, Ji-Woo Lee, and David Sulock. I d also like to thank Mark Losego for stimulating conversations about BiFeO 3. Thanks to Marta Cerruti who provided me with love, support, and excellent food! Thanks to my parents who have provided much of the same, although for a longer time and a bit differently. Finally, I would like to thank the National Center for Supercomputing Applications, the Pittsburgh Supercomputing Center, and the Physical and Mathematical Sciences cluster for the computing time that I used to complete this dissertation.

7 vi Contents List of Figures List of Tables List of Symbols and Abbreviations ix xi xiii 1 Introduction Ab Initio Calculations Transition Metal Oxides Outline Accomplishments Electronic Structure Methods Other than Quantum Monte Carlo Hartree-Fock and Post Hartree-Fock Density Functional Theory Quantum Monte Carlo Variational Monte Carlo Projector Monte Carlo Diffusion Monte Carlo Reptation Monte Carlo Pseudopotentials Periodic Boundary Conditions Finite Size Error Summary QWalk: An Implementation of Quantum Monte Carlo Organization and Implementation Methods Variational Monte Carlo Optimization of Wave Functions Diffusion Monte Carlo Reptation Monte Carlo

8 vii Correlated Sampling Systems Boundary Conditions Pseudopotentials Forms of Wave function Slater Determinant(s) Jastrow Factor Pfaffian Pairing Wave Function One-particle orbital evaluation Example calculation Other Utilities Conversion of One-particle Orbitals Plane Wave to LCAO converter Conclusion Theory: Forces in Quantum Monte Carlo Approach Filippi and Umrigar Approximation Pierleoni and Ceperley Approximation Test systems H N TiO Conclusions Notes Derivative of the Energy with Respect to the Lattice Constant Theory: Bayesian Fitting Theory Choosing Data Points Optimal Fitting Function Finding the optimal next data point Summary Application: Transition Metal Monoxide Molecules Introduction Computational Parameters Results and Discussion Energetics Dipole Moment Conclusions

9 viii 8 Application: Perovskite Crystals BaTiO Computational Parameters Optimal One-Particle Orbitals and Binding Energy Lattice Constant Finite Simulation Cell Error Zero-Point Energy Lattice Dynamics BiFeO Conclusions Summary 86 Bibliography 88 A Electric Polarization with 3D Periodic Boundary Conditions 99 B Using QWalk to Perform BaTiO 3 Calculations 103 B.1 Variational Monte Carlo B.2 Diffusion Monte Carlo C Adding a Module in QWalk 106

10 ix List of Figures 3.1 DMC simulation for a one dimensional harmonic oscillator Finite size scaling of cubic BaTiO 3 using a Slater determinant of LDA orbitals in VMC without a Jastrow factor. The averaged and weighted k-points are over the four real k-points in the cubic cell Calculation structure for the VMC method on a molecule using a Slater- Jastrow wave function Simple VMC code Flow of a QMC calculation Example input file for VMC evaluation of properties. This corresponds to the fourth box in Fig Example input file for optimization of variational parameters. This is the fifth box in Fig Example input file for evaluation of properties of the correlated wave function, plus a DMC calculation. This corresponds to the sixth and seventh block in Fig Scaling of QWalk code over processors in Monte Carlo steps per second. The system is a 2x2x2 cell of BaTiO 3 with 320 electrons and one walker per node. This is VMC; DMC is very much the same, because of the constant walker algorithm. This is close to a worst-case scenario for QMC, since the run was only approximately 40 seconds per processor Conservation of energy for different methods of calculating the force on H 2. Both the F-U and P-C approximations are equally good in this case Calculated energy difference as a function of the error in a Jastrow parameter. An accurate 3-body Jastrow factor was computed and one parameter was varied by multiplying it by a, so at a = 1, the accurate Jastrow is used Test of conservation of energy on N 2 using a two-body Jastrow factor. The P-C reweighting does not obtain the correct RMC energy differences when integrated. With a 3-body Jastrow, they will obtain the same results, and with a worse Jastrow, they will both deviate from the total energy calculations, as shown in Fig

11 x 5.4 Energy difference as a function of the error in the Jastrow factor for the TiO molecule. This was done in the same way as in Fig 5.2. It is not possible to extrapolate to the correct answer since the error in the Jastrow factor is not a function of only one variable Convergence of the minimum as we add data points starting from one side Convergence of minimum as we add data points starting from both sides and filling in the middle Calculated minimum energy lattice constant of BaTiO 3 by LDA for different fitting functions The d-p hybridization orbital (doubly occupied) for TiO in Hartree-Fock (top) and B3LYP (bottom). B3LYP enhances the hybridization significantly, which leads to lower energy in QMC The d-p hybridization orbital (doubly occupied) for MnO in Hartree-Fock (top) and B3LYP (bottom). B3LYP enhances the hybridization significantly, which leads to lower energy in QMC The energy gain in DMC from using B3LYP orbitals as a function of the metal monoxide. The line is a guide to the eye The number of determinants versus the energy and dipole moment for TiO. The dipole moments are shifted downwards by 0.1 Debye to correct for the pseudopotential error BaTiO 3 structure. The titanium is the central atom, with barium at the corners and oxygen on the faces BaTiO 3 one particle density: Hartree-Fock versus LDA BaTiO 3 one particle density isosurface in the tetragonal phase Parameters for the fitting function E(a) = b(1 exp( c(a a 0 ))) 2 + E 0 from a 2x2x2 supercell calculation of BaTiO 3, along with the mean and standard deviation of the gaussian fits Zero point energy correction for BaTiO Dynamic shift in the lattice constant for BaTiO 3 along with the energy per cell, as a function of the soft-mode distortion. This is calculated in LDA using Crystal Formula unit cell of BiFeO 3 in the ferroelectric phase, with Fe in the center and Bi at the corners. The oxygen cage is rotated and the cell is slightly distorted from the cubic phase. The antiferromagnetic ordering is in the 111 direction Experimental measurements of thin films for BiFeO 3. The references are as follows: squares:[1], circles[2], diamonds[3], triangle up[4], and triangle down[5]. What appears to be many measurements in the center of the graph is a result from the same experiment, which one would expect to be correlated. 84

12 xi List of Tables 1.1 Significant accomplishments reported in this dissertation List of approximations in the QMC method. They are decomposed into systematic approximations, which can be resolved by changing parameters in the simulation, and methodological approximations, which are inherent in the method. In principle, it is known how to remove each of these approximations, although it may be prohibitively expensive to do so for a large system The central objects of the code and their physical correspondents Optimization objective functions implemented Summary of what methods work on which materials with which Jastrow factor Binding energies of the first five transition metal monoxides by different theoretical methods, along with RMS deviations from the experiment(all in ev). Statistical uncertainties in units of 10 2 ev are shown in parentheses for Monte Carlo and experimental results. Zero point energy corrections are estimated to be much less than the size of the uncertainty in experiment Bond lengths in Å. The statistical uncertainties for ScO,TiO,VO,CrO, and MnO are respectively 0.002,0.003,0.003,0.004, and Dipole moments in Debye. The fixed-node RMC results have been obtained with a single determinant of B3LYP orbitals. See text for an analysis of the errors involved for the case of TiO Cohesive energy of BaTiO 3 by various first principles methods. All theoretical calculations are shifted by a 0.13 ev zero point energy correction DMC data for the energy of BaTiO 3 as a function of lattice constant.(2x2x2 cell, averaging over 4 real k-points) Structural parameters for BiFeO 3 in our calculations, given in terms of the lattice vectors. The lattice constant a was set to Å, and the rhombohedral angle α was degrees. All parameters were taken from the experimental values reported in Ref [2]

13 B.1 The Barium pseudopotential given as a sum of gaussians: V = k c kr n ke α kr 2. Z eff is 10e xii

14 xiii List of Symbols and Abbreviations Å meters CC Coupled Cluster technique CCSD(T) Coupled Cluster with singles, doubles, and perturbative triples DFT Density Functional Theory DMC Diffusion Monte Carlo E n Energy eigenvalue of the Hamiltonian GGA Generalized Gradient approximation of DFT H or Ĥ Hamiltonian operator LDA Local Density approximation of DFT N e Number of electrons QMC Quantum Monte Carlo r Position of a single electron R Point in many-body space RMC Reptation Monte Carlo VMC Variational Monte Carlo Z Nuclear charge Φ n Eigenfunction of the Hamiltonian Ψ T Trial wave function µ Dipole moment

15 1 Chapter 1 Introduction 1.1 Ab Initio Calculations With the Schrödinger equation, the rules of non-relativistic quantum mechanics, we can describe almost all of condensed matter. We are confident that these rules, if solved appropriately, can predict the behavior of materials, from what configurations atoms will be found to the hardness of a material to what color and the optical behavior of some object. We have a universal equation for all these things the only problem is that it is extremely difficult to solve for many interacting particles: the well-known many-body problem. This is worse than the many-body problem for classical systems, like planetary motion, because there does not exist a numerically exact, polynomially scaling method that solves the Schrödinger equation. So we are left performing some kind of approximations, which necessarily cause a loss of universality. The oldest way is to try to capture the physics, which was the only way done before computers, notwithstanding some early human computers. While the real system is extremely complex and presumably hopeless to describe accurately, one can try to make a model system that has a lot of the same features, and solve that one instead. This is particularly useful for explaining phenomena, and has been used to great success. Examples include the free electron (and now Fermi liquid) theory of metals, the BCS theory of superconductivity, and the Hubbard model of electron correlation. This approach is power-

16 2 ful, and allows us to understand by analogy. It reduces the dimensionality of the problem significantly, allowing us to have a good understanding of a material by specifying a few parameters for the model system. If we want to calculate something about a specific system, however, we either have to estimate or calculate those parameters for that system. To use an admittedly old (but still in use) example, the electrons in a metal can be modeled by free electrons with an effective mass that is different from the real electronic mass and includes the effects of many-body interactions. One can easily calculate the behavior of the metal as a function of the effective mass, but to study a particular system, the effective mass has to be gotten from somewhere, usually from an experiment. To go beyond the capture the physics model, we wish to use the Schrödinger equation to obtain a predictive theory without any adjustable parameters. Given the types of nuclei in a system, we want to be able to know what kind of material they will form, what their lowest energy geometry will be, and the properties of the material. In principle, this is possible, but as the number of atoms increases, the Schrödinger equation becomes infeasible to solve exactly. We first make a small general approximation: the Born-Oppenheimer approximation, which assumes that the wave function of nuclei and electrons are separable; that is, Ψ(R nuclei, R electron ) = Ψ n (R nuclei )Ψ e (R electron ). This is usually a good approximation because the nuclei weigh at least one thousand times more than the electrons, which means that their motions can to good approximation be decoupled. In this thesis, we will focus on solving for the electronic wave function Ψ e. Ψ n can be solved for using many of the same methods, especially Quantum Monte Carlo. Often, heavy nuclei can be treated classically with very little approximation. If we were able to solve for Ψ e, much of condensed matter physics would be described to extremely high accuracy, barring only high-energy physics and extremely relativistic particles. The Schrödinger equation for Ψ e, which we will refer to as Ψ for the rest of the dissertation is as follows (in atomic units): where Ĥ is the electronic Hamiltonian operator dψ(r, t) i = ĤΨ(R, t). (1.1) dt Ĥ = ii Z I r ii + i j 1 r ij (1.2) and large/small indices stand for nuclei/electrons. Z I is the nuclear charge. Since the Hamiltonian operator does not depend on time, it suffices to find the stationary states

17 3 Φ n such that ĤΦ n = E n Φ n. While at finite temperatures, the electrons will occupy a combination of states according to quantum statistical mechanics, at low temperatures, they will stay in the ground state to a very good approximation, only visiting the excited states on some kind of perturbation to the system, such as an incident photon. How low the temperature needs to be depends on the system, but for insulators and molecules, room temperature is well within this envelope. So we can further restrict ourselves to finding the ground state Φ 0 and the first few excited states above that, while still covering a large portion of materials of interest. To current knowledge, this is still an exponentially hard problem. Given the high desirability of Φ 0, there have been many attempts to calculate it over the past 60 or so years. Many theories and approximations have been developed, the most common of which are Hartree-Fock and post Hartree-Fock methods, and Density Functional Theory (DFT). These methods and many others function by a reduction technique, where the many-body problem is reduced onto a set of one-body problems. In Hartree-Fock and post Hartree-Fock, the functional form of the many-body wave function is limited, which in turn limits the physics that can be described. In Density Functional Theory, the problem is reduced from working with the many-body wave function to working with the one-body density, a three-dimensional object. This reduction is in principle exact, but extracting the energy of the system requires an unknown functional. While computationally quite efficient, these reductions are also limiting. With Quantum Monte Carlo (QMC), we treat the full many-body problem at the cost of computer time. By this point, QMC has established itself as a generator of excellent energetic properties, usually producing the most accurate cohesive energies available in a reasonably scaling method in materials ranging from organic molecules to transition metal systems[6, 7, 8, 9]. Band gaps and excitation energies have been calculated as well[10, 8, 11] with excellent agreement with experiment. There has been much work on forces[12, 13, 14, 15, 16], but there has not been a definitive method for calculating the energy derivative in all systems. There has also been some work on non-energetic quantities, such as the dipole moment[17], but most of these studies used the so-called pure Diffusion Monte Carlo method[18, 19], which does not work for larger systems. Recently, the Reptation Monte Carlo[20] method has been developed, which shows promise for larger systems, but has not been applied to chemical problems. Despite this promise, there are not many available programs that perform Quan-

18 4 tum Monte Carlo calculations. Discounting pure research codes, to my knowledge there are only three that can perform calculations with periodic boundary conditions: CHAMP of Umrigar and Filippi, CASINO from Cambridge, and QWalk, developed in this dissertation. Although CHAMP is currently (at the time of this writing) only available to collaborators with their respective groups, while CASINO is available under a somewhat restrictive license. On the other hand, we plan to release QWalk under the GNU Public License, which will make it the first general purpose (meaning that it can calculate both solids and molecules) QMC program available under that license. 1.2 Transition Metal Oxides Transition metal chemistry is an exciting area of research that has implications in fields from biological physics to astrophysics. Transition metals can form many types of bonds and transition metal solids exhibit useful properties like ferroelectric and ferromagnetic ordering. This interesting physics comes from the d-shell states which are very close in energy to the outer s states, along with strongly correlated electrons that make accurate first principles calculations particularly challenging. For small molecules, it is possible, although challenging, to use post Hartree-Fock methods to calculate accurate properties of transition metal oxides. This approach has poor scaling, however, and post Hartree-Fock methods are not applied to solids because of this. We are left with DFT, which tends to obtain binding energies, band gaps, and, in transition metal oxides, lattice constants in error. Using QMC, there have been studies of the antiferromagnet NiO[21, 9] and MnO[8]. Except for Ref [21], who performed a very rough optimization of the lattice constant within Variational Monte Carlo, all the studies calculated only the cohesive energy, which comes quite close to experiment. For many systems, the lattice constant is of great interest. The transition metal-oxygen bond is extremely sensitive to the distance between the metal and the oxygen, and so the minimum energy lattice constant determines most of the properties of a material. This is particularly visible in the ferroelectric effect, where an change of the lattice constant by only a few percent can make the difference between a ferroelectric material and a paraelectric material. Both high accuracy and precision are thus needed, as well as some way of calculating the lattice constants accurately for solids. I have resolved this problem in one way, with the

19 5 Bayesian analysis techniques in Chapter Outline The dissertation is laid out as follows: In Chapter 2, I go over several methods for solving the Schrödinger equation, including the wave function based Hartree-Fock and post Hartree-Fock, as well as Density Functional Theory. In Chapter 3, I present the theoretical basis for the Quantum Monte Carlo methods used in this thesis, along with a few small enhancements for this work. In Chapter 4, I describe a new program for electronic structure Quantum Monte Carlo called QWalk, developed primarily by myself. QWalk includes state of the art algorithms and is capable of treating systems with over a thousand electrons. In Chapter 5, I develop a unified framework for two separate methods of obtaining the forces within QMC. This allows me to compare them directly and find which is more robust to a poor wave function. I find that one formulation is superior, perhaps to some surprise, but neither is sufficiently robust to allow application to transition metal oxides. In Chapter 6, I develop a new application of Bayesian inference to fitting the reliable total energy of Quantum Monte Carlo data. The method not only generates welldefined and correct uncertainties on arbitrary quantities such as the minimum energy point, it also provides a means to estimate the optimal sets of data points to minimize the uncertainty. In Chapter 7, I have brought together the method developments in the first few chapters in an application to transition metal monoxide molecules. In this, I perform the first application of Reptation Monte Carlo to heavy atoms, which makes possible unbiased estimates of the challenging dipole moment. I find that for binding energies and bond lengths, QMC offers unparalleled accuracy. The dipole moment is more challenging, however, and I offer an examination of how it changes with improved treatment of electronic correlation.

20 6 In Chapter 8, I extend the study to extended materials with ABO 3 perovskite structure, focusing on the two ferroelectric materials BaTiO 3 and BiFeO 3. For the classic ferroelectric BaTiO 3, I obtain an estimate of the minimum energy cubic lattice constant in excellent agreement with experiment, which has been a long-standing problem in electronic structure calculations. In the BiFeO 3 work, done in collaboration with undergraduate researcher David Sulock, we investigate the cohesive energy and the ferroelectric well depth. We find that the cohesive energy is smaller than typical perovskites and the well depth is large, which suggests that the material is difficult to grow without defects, which will then hamper measurement of the ferroelectric effect. 1.4 Accomplishments To give a broad overview of the work in this dissertation for nonspecialists, I present a table (Table 1.1) of problems and their resolutions accomplished in my PhD work.

21 7 Table 1.1: Significant accomplishments reported in this dissertation Problem Fast and general QMC code needed There are many suggested methods for calculating forces within QMC, which are generally of uncertain accuracy and efficiency Quantum Monte Carlo data has stochastic uncertainties, making minimization difficult The accuracy of fixed-node Diffusion Monte Carlo/Reptation Monte Carlo is not well studied on transition metal oxides Perovskite lattice constants are often poorly predicted by traditional electronic structure techniques. The perovskite BiFeO 3 has a large dispersion in experimental measurements of the polarization Resolution I wrote the general purpose QMC package QWalk, which is being used by several groups I generalized the implementation of the force methods using Reptation Monte Carlo, tested several methods consistently, and found none of them accurate enough on transition metal systems. For finite difference methods, I narrowed the problem down to the Green s function approximation and suggested a way to improve it. I used Bayesian inference to create a consistent framework to handle the stochastic uncertainties and to predict the most efficient distribution of data points I studied the bond length, binding energy, and dipole moment of several TMO molecules and compared them to experiment. I found that for bond lengths and binding energies, the method is extremely accurate and that the dipole moment is very sensitive to the treatment of electronic correlation. I used Quantum Monte Carlo to calculate the cubic lattice constant of BaTiO 3, which showed that 1) it is possible to perform such a calculation on a challenging transition metal oxide solid while controlling the errors and 2) QMC is quite accurate, at least on this material, and quite likely also on others. QMC-calculated cohesive energy indicates that defect formation is likely, and the large ferroelectric barrier is in line with the large fields seen in experiment. The experimental uncertainty is likely due to defects in the grown material.

22 8 Chapter 2 Electronic Structure Methods Other than Quantum Monte Carlo 2.1 Hartree-Fock and Post Hartree-Fock Hartree-Fock and post-hartree-fock are based on the variational theorem, which states that the energy expectation value of a given wave function E = Ψ Ĥ Ψ is always greater than or equal to the ground state energy. Within a given functional form, an approximation to the true ground state wave function is thus the one that minimizes E. An excellent treatment of these methods is found in Szabo and Ostlund[22]. Hartree-Fock (HF) takes as the variational wave function a determinant of oneparticle orbitals: Ψ HF (R) = M, (2.1) where M ij = φ i (r j ). (2.2) The orbitals are varied to minimize the energy with the constraint that they remain mutually orthogonal. Since the orbitals are three-dimensional, the many-body integrals necessary to obtain the energy and other quantities reduce to sets of three-dimensional integrals.

23 9 Typically the orbitals are expanded in some sort of basis: φ i (r) = k c (i) k b k(r). (2.3) The basis functions are usually chosen so that the integrals can be evaluated analytically. This usually boils down to an expansion in plane waves or Gaussian functions. This wave function satisfies fermionic antisymmetry, but does not include any other electronic correlation. The difference between the energy of the Hartree-Fock wave function and the ground state is called the correlation energy, which is typically approximately 5% of the total energy in chemical systems. While this may seem small, most of chemistry occurs through the correlation of electrons, so it is quite important. For this reason, Hartree-Fock is not a very accurate method for chemistry, but it provides a useful starting point for other calculations. Post HF methods expand the wave function in terms of determinants, for a wave function looking like Ψ phf (R) = i c i M i. (2.4) Each M i is a determinant of a different set of orbitals, and one can optimize either only the coefficients (Configuration Interaction or CI) or the orbitals (MCSCF). While for a complete basis and summing over all possible orbitals, post HF is and exact solution, the number of determinants increases combinatorially with the basis set and the solution converges very slowly with respect to the basis set. In practical calculations, only some determinants and a small basis are treated, which limits the accuracy. The convergence with respect to the basis set is so slow because post HF gets to the electronic correlation in a very roundabout way, since it expands it in a one-particle basis. Coupled Cluster[23] is similar in idea to Configuration Interaction, but done in such a way as to retain size consistency at all orders. Size consistency ensures that individual atoms and the molecule will be treated on the same footing, improving binding energies over CI. The wave function is given as e ˆT Ψ HF, (2.5) where ˆT is the excitation operator. For example, the single excitation operator written in second quantization is ˆT 1 = i a ta i âiâ a, which describes a sum of excitations from occupied orbital i to unoccupied orbital a with a weight of t a i. ˆT2 and so on can be written in the same way, but as a sum over more variables. ˆT2 is a sum over four indices: two occupied

24 10 and two unoccupied, ˆT 3 is over six indices, etc. The Coupled Cluster wave function is found by solving for the unknown coefficients t a i for each of the different orders of excitation. Whether Coupled Cluster or Configuration Interaction is used, they both suffer from poor scaling with respect to the number of electrons in the system (greater than O(Ne 4 )) and slow convergence of the energy with respect to the basis set. These disadvantages generally limit their application to relatively small molecular systems. 2.2 Density Functional Theory Density Functional Theory (DFT) takes a completely different approach. It works in terms of the one-particle density ρ(r). One can prove[24] that there exists a one-to-one mapping between the ground state density and ground state wave function, and therefore between the density and the ground state energy. Therefore, if we know the mapping E = F [ρ(r)], we can minimize the energy with respect to the three-dimensional density function instead of the many-body wave function. As it turns out, the mapping is a functional, which can depend on integrals of, derivatives of, etc of the density. There is also no prescription for finding the density functional, so it must be either constructed from general principles or fitted. The breakthrough in DFT came when Kohn and Sham[25] wrote an ansatz for the density as a sum of one-particle orbitals: ρ(r) = i φ 2 i (r). (2.6) This allows the kinetic energy to be evaluated as if the orbitals were from a Slater determinant, which, along with the classical electrostatic energy (also known as the Hartree potential), covers most of the total energy of a system. The energy of a system is then written as E[n(r)] = 1 2 i φ i (r) 2 φ i (r)dr+ 1 2 ρ(r)ρ(r ) r r drdr + ρ(r)v ext (r)dr+e xc [ρ] (2.7) The missing quantum correlations are folded into the so called exchange-correlation energy, E xc [ρ]. Any improvement in the accuracy of DFT goes into the improvement of E xc. The Local Density Approximation (LDA) approximates the exchange correlation energy as a local function: Exc LDA [ρ] = e xc (ρ(r))ρ(r)dr, (2.8)

25 11 where e xc (ρ) is a simple function. The canonical version of this is a fit to Quantum Monte Carlo data[26] for the homogeneous electron gas. Further refinements such as the Generalized Gradient Approximation (GGA) have built on the LDA, by adding a dependence on the gradient: Exc GGA [ρ] = e xc (ρ(r), ρ(r))ρ(r)dr. (2.9) This allows for further accuracy, although there is some room for creativity in the formulation of the GGA. In chemistry, the BLYP[27, 28] functional is quite popular, and in solid-state, the PBE[29] functional is commonly used. Note that there is not a strict ladder of accuracy, however. formulation and the material. The GGAs can be less accurate than the LDA depending on the Both LDA and GGA suffer from the self-interaction problem, where each electron interacts with itself. This is seen in the second term in Eqn 2.7, where the electron density interacts with itself. This is valid within classical electrostatics, but not in quantum mechanics. There have been methods to fix this, including LDA+U[30] and the selfinteraction correction[31], although there is some arbitrariness in their application. A large collaboration[32] recently tested various corrections to LDA on the MnO solid, and found that they show rather large variation. It is thus difficult to know which of the corrections, if any, are appropriate. Another method to improve the DFT functional is to add some percentage of the Hartree-Fock energy to the functional. There is no theoretical way to know the correct percentage, however, so it is typically fitted to experiment, as in B3LYP, which uses the LYP functional in the 3-parameter fit by Becke[33]. While theoretically not a first principles method, B3LYP is heavily used in the chemistry community as a very accurate semi-empirical model. Unfortunately, for solids, B3LYP often performs much worse than PBE. There are also the new meta-ggas, which add a dependence on the second derivative of the density. Of these, TPSS[34, 35] and its semi-empirical hybrid[36] are claimed to be quite accurate, and they will be compared to QMC in Chapter 7. While DFT offers quite reasonable accuracy for a low computational cost, it pays in the form of the unknown density functional. There is no variational theorem among functionals, so it is impossible to know if one approximate functional is better than another without looking to experiment, unlike the variational post Hartree-Fock methods that can systematically converge to the exact wave function. Also, while it is in principle possible to develop functionals for quantities other than energies, it has not been done much in practice,

26 12 and interesting things like correlation functions are unavailable, or have to be estimated by approximating the wave function as a Slater determinant of the one-particle orbitals in the expansion for the density. A step in the right direction for some quantities of interest is to estimate the error[37] in the GGA by calculating a quantity within several different GGAs and examining the variance.

27 13 Chapter 3 Quantum Monte Carlo Here, we go over the QMC methods used in the dissertation in detail. Diffusion Monte Carlo and Variational Monte Carlo are covered in more depth in a review article[38]. 3.1 Variational Monte Carlo The expectation value for an arbitrary operator O and a given real trial variational wave function Ψ T is given by O = Ψ T O Ψ T Ψ T Ψ T = Ψ 2 T (R)[OΨ T (R)/Ψ T (R)]dR Ψ 2 T (R)dR where R = (r 1, r 2,..., r Ne ) denotes a set of N e electron coordinates in 3D space. Typically, such integrals are evaluated by reducing the multi-dimensional integral into a set of lowdimensional integrals. Unfortunately, this either restricts the functional form of Ψ T (R) or makes the calculations undoable for more than a few electrons. One of the key motivations for employing stochastic approaches is to eliminate this restriction and to gain qualitatively new variational freedom for describing many-body effects. In order to evaluate the expectation value integral stochastically we first generate a set {R m } of statistically independent sampling points distributed according to Ψ 2 T (R) using the Metropolis algorithm. The expectation value is then estimated by averaging over the samples {R m }. For example, the VMC energy E V MC is given by the average of the

28 14 quantity called local energy (E loc ) E V MC = 1 M M m=1 HΨ T (R m ) Ψ T (R m ) + ε = 1 M M E loc (R m ) + ε (3.1) m=1 with the statistical error ε proportional to 1/ M. It is straightforward to apply the variational theorem in this framework. One starts with a variational wave function Ψ T (R, P ), where R is the set of all the electron positions and P is the set of variational parameters in the wave function E(P ) = ΨT (R, P )HΨ T (R, P )dr Ψ 2 T (R, P )dr (3.2) A (hopefully) good approximation to the ground state is then the wave function with the set of parameters P that minimizes E(P ). The stochastic method of integration allows us to use explicitly correlated trial wave functions such as, for example, Slater-Jastrow and other functional forms as explained later. In fact, as long as the trial function and its derivatives can be evaluated quickly, any functional form can be used. This procedure is broken down into two parts: sampling Ψ 2 T while evaluating energy and other properties, and optimizing the wave function. The first part, sampling Ψ 2 T, is carried out using the Metropolis-Hastings[39, 40] algorithm. We start with a point R in 3N e dimensional space and generate a second point R according to the transition probability T (R R). T is a completely arbitrary function so long as T (R R) 0 T (R R ) 0; that is, all moves are reversible. We then accept the move with probability ( a = min 1, Ψ2 T (R )T (R R ) ) Ψ 2 T (R)T. (3.3) (R R) After a few steps, the distribution converges to Ψ 2 T, and we continue making the moves until the statistical uncertainties are small enough. For atoms with pseudopotentials, we use the moves as outlined in Ref [38], modified with a delayed rejection step similar to Ref [41], although developed independently, and for full-core calculations, we use the accelerated Metropolis method from Ref [42]. The total energy and its components are evaluated, as well as other properties. We then optimize the wave function using a fixed set of sample points. This is done by correlated sampling, as explained later in Sec to obtain small energy (or other property) differences with much greater precision than the total energy. There are many

29 15 quantities other than energy that, upon being minimized, will provide an approximation to the ground state wave function. One important one is the variance of the local energy; that is σ 2 = drψ 2 T (R)(E loc E loc ) 2 / drψ 2 T (R). (3.4) Since E loc is a constant when Ψ T = Φ 0, the variance will go to zero for an exact eigenstate. There are several other possible functions, listed in Sec 4.2.2, but variance and energy are generally the only ones optimized in this dissertation. 3.2 Projector Monte Carlo The variational wave functions are only as accurate as the guessed functional form, whose error is difficult to control in a systematic manner across variety of systems. In order to minimize this bias we employ the projector method, which projects out the ground state of a given symmetry from any trial wave function. To do this, we follow the time propagation of the so-called imaginary-time Schrödinger equation d Ψ(τ) dτ = (H E 0 ) Ψ(τ). (3.5) The time evolution operator is e (H E0)τ, and E 0 will be determined self-consistently. For any state Ψ(0) with a non-zero overlap with the ground state Φ 0, we can expand in a spectral decomposition in the eigenkets: e (H E0)τ Ψ(0) = c 0 Φ 0 + c i e (E i E 0 )τ Φ i. (3.6) Since the ground state has the lowest energy, all states except Φ 0 are filtered out in the limit as τ. For small τ, we can write the projection operator in R-representation as R e (H E0)τ R = G(R, R, τ) exp( (R R) 2 /2τ) exp( 1 2 (V (R) + V (R ) 2E 0 )) (3.7) (we will write symmetric Green s functions with a comma, and non-symmetric Green s functions with an arrow indicating what direction it is acting). This can be interpreted as a dynamic diffusion kernel G D (R, R, τ) = exp( (R R) 2 /2τ) plus a branching kernel G B (R, R, τ) = exp( 1 2 (V (R) + V (R ) 2E 0 )). The basic idea of projector Monte Carlo is i=1

30 16 to sample a path G(R N, R N 1, τ)...g(r 2, R 1, τ)ψ T (R 1 ), that is, to work in the extended space with dimension 3N e N. For N large enough, the distribution of R N will approach Φ 0. However, to interpret this as a stochastic process, the path distribution must be positive; that is, the product of all G s with Ψ T must be positive. This gives rise to the fixed node approximation, where the nodes(the places where the wave function equals zero) of the ground state wave function are approximated as the nodes of the trial wave function. One can avoid this restriction by performing a released-node calculation[26], although the price is a change from polynomial to exponential scaling with system size. In actual calculations, we perform an importance-sampling transformation, where G(R, R, τ) is replaced by the importance sampled Green s function G(R R, τ) = Ψ T (R )G(R, R, τ)/ψ T (R) (3.8) The dynamic part of the Green s function then becomes G D (R R, τ) = exp( (R R τ lnψ T (R)) 2 /2τ), (3.9) a drifting Gaussian, and the branching part becomes G B (R, R, τ) = exp( τ 2 (E L(R) + E L (R ) 2E 0 )), (3.10) both of which are much better-behaved stochastically, since the force lnψ T (R) biases the walk to where the wave function is large, and the local energy E L (R) is much smoother than the potential energy. This expansion is easiest to see by multiplying both sides of Eqn 3.5 by Ψ T and rewriting the differential equation in terms of f(r) = Ψ T (R)Ψ(R, τ). The application of the importance sampled Green s function is f(r, τ) = G(R, R, τ)ψ 2 T (R)dR = Ψ T (R )G(R, R, τ)/ψ T (R)Ψ 2 T (R)dR = Ψ T (R ) G(R, R, τ)ψ T (R). = Ψ T (R )Ψ(R, τ). Therefore, if we generate the path G(R N, R N 1, τ)... G(R 2, R 1, τ)ψ 2 T (R 1), for large enough N, the distribution of R N is Ψ T (R N )Φ 0 (R N ), which is called the mixed distribution. The ground state energy is obtainable by evaluating the integral ΨT Φ 0 HΨ T /Ψ T dr Φ0 HΨ T dr = ΨT Φ 0 dr ΨT Φ 0 dr = E 0, (3.11)

31 17 since Φ 0 is an eigenstate of H within the nodal boundaries. We use two versions of the projector method: Diffusion Monte Carlo, which has the advantage that the large N limit is easily obtained, and Reptation Monte Carlo, which makes the pure distribution Φ 2 0 available. Note that in Monte Carlo, the extra time dimension do not cause an extra cost in efficiency over VMC, since only the variance of quantities matters for efficiency, not the number of dimensions. Projector Monte Carlo is typically approximately ten times slower than VMC, though, because the short time Green s function approximation requires us to move in space much more slowly than in VMC Diffusion Monte Carlo Diffusion Monte Carlo has been discussed by many authors (for a review, see Ref [38, 42]); it attains the f(r, τ) by starting with a distribution of Ψ 2 T and interpreting the action of the Green s function as a stochastic process with killing and branching, eventually ending up with the mixed distribution Ψ T Φ 0. It has the advantage that the τ limit is easy to achieve, but the disadvantage of not having access to the pure distribution Φ 2 0. To elucidate the Diffusion Monte Carlo algorithm, we start with a number of walkers distributed according to Ψ 2 (R, 0) = Ψ 2 T (R). We then take one Monte Carlo step using the importance sampling Green s function, which is separated into the dynamic and branching parts, as above. The dynamic Green s function is the same as the one for Brownian motion with a drift term, so each walker moves according to the quantum force τ lnψ T (R), then diffuses. The branching part of the Green s function is not a normalizable stochastic process, so it becomes an averaging weight, which is accumulated over the simulation. Note that if E 0 is the average value for the local energy, then the weights will average to be 1. Thus, we can self-consistently determine E 0 as the number that causes the average weight to be 1, which will be adjusted over the simulation. Alternatively, E 0 can be found by averaging the local energy over the mixed distribution. The core of the DMC algorithm is in these two rules for updating the positions and weights of walkers while adjusting E 0 to keep the average weight equal to one. The rest is improving the efficiency. After many steps, the weights begin to diverge, with some becoming very large and some becoming very small. The highest efficiency will occur when all the weights are near one, so the walkers with large weights are branched, split in two each with half the weight, and the walkers with small weights are killed. This can be done in a stochastically

32 18 No importance sampling Importance sampling Figure 3.1: 100 walker DMC simulation for a one dimensional harmonic oscillator. The horizontal axis is the position coordinate and the vertical axis is the time coordinate moving upwards. The non-importance-sampling method samples the probability distribution Φ 0 = exp( 1 2 x2 ), while the importance-sampling method samples Ψ T Φ 0 = exp( 2 5 x2 ) exp( 1 2 x2 ) (the trial function was input) as the time increases. Since without importance sampling, the only dynamical move is to diffuse randomly, the walkers are forced to the center by killing at the sides. The importance sampling transformation biases the dynamics so the walkers stay in the center, reducing the killing and increasing the efficiency, while averaging over the smooth local energy instead of the potential energy.

33 19 correct way by choosing two walkers, one with a large weight w 1 and the other with a small weight w 2. Walker 1 is branched and walker 2 is killed with probability w 1 w 1 +w 2 and each copy takes on the weight w 1+w 2 2. Otherwise, walker 2 is branched and walker 1 is killed, with each copy having the same weight. This control of the weights creates a very small bias that goes to zero as the number of walkers goes to infinity. Therefore, we choose a sizable number of walkers, approximately 1000, to keep this error to a minimum. The extra walkers do not affect efficiency, since they serve as averaging points to reduce the stochastic uncertainty. After 100 steps the distribution is converged, and the energy can be obtained by averaging over many generations of walkers. Another large efficiency improvement can be made in the dynamical part of the Green s function. The exact non-importance-sampled Green s function must be symmetric, so G(R, R, τ) = G(R, R, τ). Since the importance-sampled Green s function satisfies G(R R, τ) = Ψ T (R )G(R, R, τ)/ψ T (R), (3.12) it must then satisfy G(R R, τ) Ψ T (R) Ψ T (R ) = G(R R ) Ψ T (R ) Ψ T (R) (3.13) which can be rewritten as G(R R)Ψ 2 T (R) = G(R R )Ψ 2 T (R ). (3.14) This is simply the detailed balance condition, which can be enforced the same way as we did in VMC by introducing an acceptance step with the same probability as Eqn 3.3, using only the dynamic Green s function, since the branching part is already symmetric. This modification has been shown[42] to have very small time step errors, and has the pleasing result of having the DMC algorithm be essentially VMC dynamics at a small time step with reweighting according to the branching Green s function. Other than only obtaining the mixed distribution (and not the pure distribution Φ 2 0 ), a subtle limitation of DMC is that the branching process spoils calculations of imaginary-time correlations and can decrease the efficiency of the simulation if there is too much branching. Even with these limitations, DMC is probably the most efficient way to obtain the fixed-node approximation to the ground state energy.

34 Reptation Monte Carlo For local operators that do not commute with the Hamiltonian, we use Reptation Monte Carlo[20] with the bounce algorithm[43]. We sample the path distribution Π(s) = Ψ T (R 0 )G(R 0, R 1, τ)... G(R n 1, R n, τ)ψ T (R n ) (3.15) where s = [R 0, R 1,..., R n 1, R n ] is a projection path. This can be interpreted in several different ways. If we examine the distribution at R 0, we can view the samples of Green s functions as acting on Ψ T (R n ), and therefore the probability distribution of R 0 is P R0 (R 0 ) = Ψ T (R 0 )Φ 0 (R 0 ). This is the same distribution as we obtain in DMC as the path length goes to infinity. Alternatively, since G is symmetric on exchange of the two R coordinates, the probability distribution of R n is the same. Finally, we can split the path in two, one projecting on Ψ T (R 0 ), and the other projecting on Ψ T (R n ). We then have P Rn/2 (R n/2 ) = (G(R n/2, R n/2 1 )... G(R 1, R 2 )Ψ T (R 0 )) (G(R n/2, R n/2+1 )... G(R n 1, R n )Ψ T (R n )) = Φ 2 0(R n/2 ) for n, which allows us to obtain correct expectation values of operators that do not commute with the Hamiltonian. We thus evaluate the energy as E RMC = [E L (R 0 ) + E L (R n )]/2 and operators non-commuting with H as O RMC = O(R n/2 ). Reptation Monte Carlo does not include branching; it instead uses an acceptance/rejection step. This is a tradeoff, allowing us to project only for a finite τ, since otherwise the probability distribution function is not normalizable, but allowing access to the pure distribution and imaginary time correlations. The algorithm is realized as follows: we first start with a single walker R 0 distributed according to Ψ 2 T. We then apply the importance-sampled dynamic Green s function to the walker n times, while keeping the path in memory. This results in a path s = [R 0, R 1,..., R n 1, R n ] distributed close to the distribution we wish to generate, missing only the branching part of the Green s function. Next, we choose a direction to go, forwards or backwards. Suppose we choose to go forwards. We generate a new position in 3N e dimensional space X by applying the importance-sampled dynamic Green s function starting from the end R n to form a new path s = [R 1, R 2,..., R n, X]. The idea is to move the path like a snake, thus the name reptation from reptile. The move is then accepted or

35 21 rejected according to the Metropolis algorithm: ( a = min 1, Π(s )T (s s ) ) Π(s)T (s. (3.16) s) The acceptance reduces to a very simple term. First, we look at the ratio of the desired distribution Π(s )/Π(s). By writing down Eqn 3.15 for each path and canceling like terms, we obtain Π(s ) Π(s) = Ψ T (R 1 )G(R n, X, τ)ψ T (X) Ψ T (R 0 )G(R 0, R 1, τ)ψ T (R n ). (3.17) The transfer probability T (s s) is the probability to generate the new point G D (X R n, τ), since we used the importance-sampled dynamics. G D (R 0 R 1, τ), since we could have gone backwards. Since The reverse move is similarly G D (R R, τ) G B (R R, τ) G(R R, τ) = Ψ T (R )G(R, R, τ)/ψ T (R), (3.18) after substitution for G D and cancellation of common terms, we end up with ( a = min 1, exp( τ 2 (E ) L(R n ) + E L (X)) exp( τ 2 (E, (3.19) L(R 0 ) + E L (R 1 )) the ratio of the branching parts of the importance-sampled Green s function. In retrospect, this might have been expected, since we generated the new point with the dynamic part; all that is left is the branching part. Unlike DMC, RMC is a pure Metropolis algorithm the walk is just in the path space. This is a big advantage in terms of parallel scalability and simplicity of implementation. The biggest disadvantage is that since the path can move forward or backward each step, it takes more steps to generate a new path the larger n is. This can be remedied to a large part by the bounce[43] modification, but for very long paths it still becomes slower. The bounce algorithm makes one minor modification to the algorithm as above. Rather than choosing the direction randomly at each step, the bounce algorithm instead continues in the same direction until a step is rejected. This does not satisfy detailed balance, but the transfer matrix does have the correct distribution as an eigenfunction, which is sufficient to show that the bounce algorithm generates Π(s). This increases the efficiency of RMC tremendously, because instead of the path moving one step right, then one step left while getting nowhere, it moves perhaps 100 steps to the right, replacing the entire path, then 100 to the left, generating a whole new path. In our calculations, the RMC slowdown appears at approximately 150 electrons, and before that happens, the efficiency is almost identical to DMC for the ground state

36 22 energy, while also offering an unbiased estimation of quantities that do not commute with the Hamiltonian. The only cost is that the projection length must be chosen long enough that all quantities are converged, but short enough that the path does not slow down. This is not too much of a restriction, because the cost of the calculation only rises when the path is not completely replaced before a bounce. We mentioned that RMC only obtains unbiased estimates for local operators. This is only partially true. Nonlocal operators are possible, but difficult. To evaluate a general operator O at the center of the path in RMC, we are calculating the integral O = dr 0... dr n Ψ T (R 0 )G(R 0, R 1 )... G(R n/2 1, R n/2 )OG(R n/2, R n/2+1 )... G(R n 1, R n )Ψ T (R n ), where we have suppressed the τ dependency and the normalization over paths. If the operator is local, then it is simply evaluated by the procedure explained above. If it is not, then one must evaluate OG(R n/2, R n/2+1 ) in one way or another. This has little relevance for the dissertation, but is discussed at length for finite temperature path integral calculations[44]. 3.3 Pseudopotentials Heavy atoms cause a problem with QMC because of the high energy core region. There, the large potential energy causes the core electrons to have large kinetic energy and the local energy to fluctuate wildly. The core electrons contribute very little to the chemical properties of a material, but make the calculation much more difficult. This problem has been solved in Hartree-Fock, DFT, etc. by removing the core electrons and introducing a pseudopotential to replace them. The end result is a pseudo-atom which has only valence electrons that mimic the behavior of the valence electrons of the real atom. This is done by adding a local term in the Hamiltonian, which does not cause problems, and a term that is dependent on the angular momentum l: V pseudo = l V l (R) l l. (3.20)

37 23 The projection operator l l mimics the action of the core electrons, since s core electrons interact differently with other s electrons than with other angular momenta, etc. While the pseudopotential has the desired effect of removing the core electrons, the projection operator introduces nonlocality in the potential energy, which in turn makes the stochastic interpretation of the imaginary time Schrödinger equation difficult. This has recently been overcome[45] within DMC and RMC, although the work in this thesis uses a simpler method from Mitas et al.[46]. Instead of using the nonlocal pseudopotential Hamiltonian, we use the Hamiltonian V loc (R) = V pseudoψ T (R) Ψ T (R), which, similar to the local energy, is a local function of R. The DMC (or RMC) calculation is then performed normally. This approximation is called the locality approximation, and is exact when Ψ T is the same as the ground state wave function. It converges as the square of the error in Ψ T, so for a good Jastrow factor, the error is generally fairly small, on the order of the fixed node approximation or less. 3.4 Periodic Boundary Conditions For molecules, the implementation of QMC methods are fairly straightforward, but for infinite(periodic) systems, there are several techniques to obtain the approach to the infinite limit. The complicating factor for solids is that the real system that we are trying to simulate is a molecule containing at least particles, distributed in some regular way. If we enforce periodic boundary conditions, where an electron escaping a simulation cell reappears on the other side, it is a good approximation to the large molecule that we call a solid Finite Size Error The error due to the finite simulation cell can be, to good approximation[47], separated into two terms: the independent particle finite size effect(ipfse) and the coulomb finite size effect. The IPFSE is related to the kinetic energy, and can be solved by sampling k-points. It has previously been noted that using a k-point other than the gamma point leads to smaller finite size error[48], but in the spirit of twisted boundary conditions[49], we sample over all the k-points that can be written as real functions. This corresponds to the Γ point(0,0,0) and the k-points at the edge of the Brilluoin zone. More k-points can

38 24 be added at the cost of complex arithmetic by using the fixed-phase Diffusion Monte Carlo algorithm[50]. We didn t find that necessary, since the 2x2x2 cell with all k-points was sufficient for the calculations in this work. For a cubic cell with lattice constant a, there are four real independent k-points: Γ(0,0,0), X(π/a,0,0), R(π/a,π/a,0), and L(π/a,π/a,π/a). By counting the number of symmetry-related k-points, we can assign respective weights of 1 8, 3 8, 3 8, and 1 8. It is somewhat more convenient to weight them all the same in DMC, since then the optimal errors are obtained by performing all of the calculations for the same length of time. It turns out that setting the weights to one does not affect the calculated energy significantly, at least for the perovskite materials studied here(see Fig 3.2). For cases in which the weights do matter, the error bar is given as ɛ = 1 i w wi 2ɛ2 i, (3.21) i i where the sum is over all k-points treated. The error bar is proportional to 1/ T i, where T i is the computer time spent on that k-point. Therefore, we must minimize the function i w 2 i T i (3.22) subject to the constraint that i T i = T, the total amount of computer time spent. For the cubic case, this reduces to finding the minimum of 1/x+9/(T x) (setting T 1 = T 4 = x and T 2 = T 3 = T x), which is approximately x = 0.2T, although the minimum is very shallow. Therefore, optimally, one should spend four times longer on the k-points with weight equal to three than with one. Spending equal time (x = 0.5T ) results in an efficiency penalty of about 20%. The coulomb finite size effect only appears in many-body simulations, and is the result of the interaction of an electron with its periodic image. With PBCs, instead of an infinite solid with all the electrons moving more or less independently, we simulate lattices of electrons interacting with each other. For large enough cells, the lattices are spaced far enough apart that it is a good approximation, but the coulomb finite size error precludes simulation at the primitive cell size for most materials. There have been various attempts to correct this size effect in a consistent way[47, 51], but we use a heuristic which seems to work just about as well. For the homogeneous electron gas, the potential energy scales as 1/r s, where r s is the average separation between electrons. Therefore, the importance of the potential energy finite size error should scale in the same way, with some prefactor.

39 averaged k-points, coulomb correction weighted k-points, coulomb correction L point, coulomb correction L point, no coulomb correction LDA(2x2x2 kmesh, shifted) Energy/cell (Hartrees) Supercell size (nxnxn) Figure 3.2: Finite size scaling of cubic BaTiO 3 using a Slater determinant of LDA orbitals in VMC without a Jastrow factor. The averaged and weighted k-points are over the four real k-points in the cubic cell. By doing finite size extrapolations for several materials, Mitas(unpublished) came to a correction equal to 0.36 r s Hartrees for most solids, which appears to work quite well. Note that this correction is the same for all cell sizes, so the relative importance decreases as the inverse of the system size, as it is generally expected. Fig 3.2 shows the performance of this scheme, where we see that our scheme performs quite well, converging to less than 10 mh per cell with the 2x2x2 cell. Corrections using the LDA, which have been used in the past, are actually in the wrong direction. 3.5 Summary QMC makes as few approximations as possible, and we have at least an idea of how each approximation can be removed, even if it is infeasible to do so. We give a list of

40 26 Table 3.1: List of approximations in the QMC method. They are decomposed into systematic approximations, which can be resolved by changing parameters in the simulation, and methodological approximations, which are inherent in the method. In principle, it is known how to remove each of these approximations, although it may be prohibitively expensive to do so for a large system. Approximation Resolution Systematic Small time step expansion for the Green s function extrapolate to zero time step Population control in DMC add more walkers Finite projection length in RMC Finite size simulation cell in periodic boundary conditions Methodological Fixed-node approximation Pseudopotential transferability error Localization approximation for pseudopotentials extrapolate to infinite path length Sample k-points and extrapolate to infinite size Vary nodal structure to minimize fixed-node energy. This is still an approximation in this case. Only completely resolvable by exact projection, which is exponentially scaling Check full-core calculations in DFT or Hartree-Fock. Mostly cancels out in energy differences Use Casula s algorithm[45] or use a very good trial wave function; the Jastrow factor helps. Typically quite small error the approximations made and their resolutions in Table 3.5 The only approximations that cannot be reduced to a negligible contribution are the fixed-node approximation and the pseudopotential approximation. The fixed-node approximation can be improved with elaborate trial wave functions, as will be done for the transition metal monoxides, but for large systems, we are for the moment stuck with the nodes of a Slater determinant of one-particle orbitals. As we shall see, even this variational freedom is enough to obtain quite good accuracy. The pseudopotentials must be tested carefully; all the work here uses small-core pseudopotentials to maximize transferability.

41 27 Chapter 4 QWalk: An Implementation of Quantum Monte Carlo We have developed a new program QWalk for QMC calculations written in C++ with modern programming techniques and incorporating state of the art algorithms in a fast and scalable code. QWalk has already been used in several publications[7, 8, 52, 53, 54] not including the work in this dissertation. 4.1 Organization and Implementation The code is written in a combination of object-oriented and procedural techniques. The object-oriented approach is coarse-grained, creating independent sections of code that are written efficiently in a procedural fashion. It is extremely modular; almost every piece can be removed and replaced with another. A contributor of a module only has to change one line in the main code to allow use of a new module. This allows for flexibility while keeping the code base relatively simple and separable. The modular structure also allows for partial rewrites of the code without worrying about other parts. In fact, each major module has been rewritten several times in this manner as we add new features and refactor the code. For the user, this structure shows itself in flexibility.

42 28 Method (VMC) Most Abstraction System (Molecule) Wavefunction_type (Multiply) Dynamics generator (Delayed Rejection) Wavefunction_data (Slater Determinant) Wavefunction_data (Jastrow factor) Molecular orbital evaluator(linear) One-body Jastrow Two-body Jastrow Basis_function (Gaussian) Basis_function (cutoff Pade) Basis_function (cutoff Pade) Least abstraction Figure 4.1: Calculation structure for the VMC method on a molecule using a Slater-Jastrow wave function. Table 4.1: The central objects of the code and their physical correspondents Module name Mathematical object System parameters and form of the Hamiltonian Sample point R, the integration variables Wave function type Wave function ansatz Wave function Ψ T (R), Ψ T (R), 2 Ψ T (R) Dynamics generator Metropolis step, aka Green s function, aka Trial move

43 29 The modules form a tree of successive abstractions (Fig 4.1). At the top of the tree is the QMC method, VMC in this case. It works only in terms of the objects directly below it, which are the concepts of System, Wave function data, etc. (see Table 4.1). These in turn may have further abstractions below them, as we ve shown for the wave function object. The highest wave function object is of type Multiply, which uses two wave function types to create a combined wave function. In this case, it multiplies a Slater determinant with a Jastrow correlation factor to form a Slater-Jastrow function. Since the wave functions are pluggable, the Slater determinant can be replaced with any antisymmetric function, as well as the Jastrow factor. The type is listed along with the specific instant of that type in parenthesis. At each level, the part in parenthesis could be replaced with another module of the same type. We present an implementation of the VMC algorithm as an example of how the code is organized(fig 4.2). For reasons of space, we do not write the function line-by-line, which includes monitoring variables, etc., but instead give a sketch of the algorithm. The VMC method works at the highest level of abstraction, only in terms of the wave function, system, and random dynamics. It does not care what kind of system, wave function, etc. are plugged in, only that they conform to the correct interfaces. In an App C, we give an example of how to create a new module. We will now provide a listing of the available modules for the major types, along with some details of their implementation. 4.2 Methods Variational Monte Carlo density Ψ 2 T (R). The VMC module implements the Metropolis method to sample the probability It has been described in Sec 3.1 to some detail the method is more or less a direct translation. Beyond the basic algorithm, it implements correlated sampling as explained in Sec for small energy differences between very similar systems, and can evaluate many properties, including the one-particle density, the structure factor(of periodic systems), and the dipole moment (of finite systems).

44 30 Vmc_method::run(vector <string> & vmc_section, vector <string> & system_section, vector <string> & wavefunction_section) { //Allocate the objects we will be working with System * sys=null; allocate(sys, system_section); Wavefunction_data * wfdata=null; allocate(wfdata, sys, wavefunction_section); Sample_point * sample=null; sys->generatesample(sample); Wavefunction * wf=null; wfdata->generatewavefunction(wf); //the Sample_point will tell the Wavefunction when we move an //electron sample->attachwavefunction(wf); sample->randomguess(); //This is the entire VMC algorithm for(int s=0; s< nsteps; s++) { for(int e=0; e < nelectrons; e++) { dynamics_generator->sample(e,timestep,wf,sample); } //end electron loop //gather averages } //end step loop //report final averages Figure 4.2: Simple VMC code

45 31 Table 4.2: Optimization objective functions implemented Function Minimized quantity Variance (E L (R) E ref ) 2 Absolute value E L (R) E ref Lorentz ln(1 + (E L (R) E ref ) 2 /2) Energy E L (R) Mixed aenergy + (1 a)v ariance,0 < a < Optimization of Wave Functions We have implemented three different methods for optimization. All methods are capable of optimizing several objective functions(see Table 4.2). Any of these objective functions will obtain the correct ground state with an infinitely flexible function, but may obtain different minima for incomplete wave functions and some are easier to optimize than others. The first (OPTIMIZE) is based on Umrigar et al. s[55] variance optimization. The method minimizes the objective function on a set of fixed configurations from VMC using a conjugate gradient technique, usually not reweighting the averages as the wave function changes. Optimizing the energy using OPTIMIZE is quite expensive, because it requires many configurations to evaluate an unbiased estimate of the energy derivative. The next two are based on Umrigar and Filippi s Newton optimization[56] method. OPTIMIZE2 also uses a fixed set of configurations, but instead of evaluating only the first derivatives of the objective function, as conjugate gradients do, it uses a low-variance estimator for the Hessian matrix and Newton s method to find the zeros of the first derivatives. OPTIMIZE2 is able to produce better wave functions with lower energies than OPTIMIZE by directly optimizing the energy even for very large systems (we have applied it for up to 320 electrons) while costing slightly more. This method was used throughout this work to optimize a linear combination of 95% energy and 5% variance. Finally, NEWTON OPT uses a fixed set of configurations to evaluate only a single Hessian matrix, then evaluates the optimal length of the optimization step using VMC correlated sampling, as suggested by Cyrus Umrigar. This method is able to find the very lowest energy wave functions, since the configurations are regenerated every optimization step. However, NEWTON OPT is much more expensive than the other two methods.

46 Diffusion Monte Carlo DMC is implemented almost identically to VMC, except that the time step is typically much smaller and each walker accumulates a weight equal to exp( τ eff 2 (E L(R ) + E L (R) 2E ref )). Since we use an acceptance/rejection step, τ eff is chosen somewhat smaller than τ as τ eff = pτ, where p is the acceptance ratio. To control the fluctuations in the weights, we employ a constant-walker branching algorithm, which improves the parallel load balancing properties of DMC. Every few steps, on each node, we choose a set of walkers that have large weights (w 1 ) for branching. Each one of these walkers is matched with a smaller weight walker (w 2 ) which is due for killing. The large weight walker is branched w and the small weight walker is killed with probability 1 w 1 +w 2, with each copy gaining a weight of w 1+w 2 2. Otherwise, the small weight walker is branched and the large weight walker is killed, with the copies having the same weight as before. Occasionally, walkers are exchanged between nodes to keep the weight per processor approximately constant. This conserves the total weight and keeps the number of walkers constant. QWalk keeps track of two numbers: E ref and E 0. E ref is first set to the VMC average energy, and then to the energy of the last block. The energy that goes into the weights, E 0, is then calculated every few steps as ( ) wi E 0 = E ref log, (4.1) N conf where N conf is the number of sample points (configurations) in the simulation. During the DMC calculation, the local energy will very occasionally fluctuate down drastically, causing the weight to increase too much. This can be fixed by cutting off the weights. For fluctuations beyond ten standard deviations of the energy, we smoothly bring the effective time step to zero for the weights, which avoids the efficiency problem without introducing a noticeable error Reptation Monte Carlo The fluctuations in the local energy part of the Green s function can cause the path in RMC to get stuck, so we cut off the effective time step in the same way as in DMC. The branching part of the Green s function is otherwise quite smooth. We use the same dynamic Green s function as we do in DMC (either a standard metropolis rejection step or the UNR[42] algorithm), so we accept/reject based only on the branching part of

47 33 the Green s function as explained in Sec We use the bounce algorithm[43], which improves the efficiency by allowing the path to explore the many-body phase space much more quickly Correlated Sampling Correlated sampling is a technique where one samples two very similar systems with the same sets of samples. The variance in the difference will decrease as V ar(x Y ) = V ar(x) + V ar(y ) 2Cov(X, Y ), so for perfectly correlated sampling, the variance will be zero for the difference. In QWalk, this is handled by performing a primary walk that samples some probability distribution P 1 (X). Averages are obtained as usual by calculating the integral O 1 = P 1 (X)O 1 dx. Suppose we wish to find O 2 O 1. It can be written as P 2 (X)O 2 P 1 (X)O 1 dx = [ ] P2 P 1 (X) O 2 O 1 dx. (4.2) P 1 Since we are sampling P 1 (X), in the Monte Carlo averaging, this integral is evaluated by averaging the weighted difference over sample points: [ ] N w i (X i )O 2 (X i ) j w O 1(X i ) i(x i ) N i (4.3) The difference in the methods is only in how they determine the weights. VMC, DMC and RMC all support correlated sampling between arbitrary systems. In VMC, the weights are w(x) = Ψ2 2 (X) Ψ 2 1 (X), which is an exact relationship. DMC and RMC both require some approximation to the Green s function to weight the secondary averages properly. In both, we use the approximation of Filippi and Umrigar[12], who discuss the problem in detail. See also Chapter 5 for a discussion on correlated sampling and its limitations. 4.3 Systems Boundary Conditions Most systems of interest are covered by either open boundary conditions or periodic boundary conditions. Adding new boundary conditions is also quite simple. Molecules

48 34 with arbitrary atoms, charge, spin state, and with finite electric field are supported. In 3D periodic systems, the calculation can be done at any real k-point, allowing k-point integrations, and there is also a finite size correction as discussed in Sec The code has been used on systems with up to 135 atoms and 1080 electrons; the limiting factor is the amount of computer time needed to reduce the stochastic uncertainties Pseudopotentials operators: QWalk accepts pseudopotentials as an expansion of nonlocal angular momentum lmax ˆV ECP = V local (R) + V l (R) l l (4.4) for arbitrary maximum angular moment. V l is a basis function object that is typically a spline interpolation of a grid or a sum of Gaussian functions. While any pseudopotential of l=0 this form can be used, we use soft potentials in which the Z r divergence has been removed from the nuclei-electron interaction. These potentials have been created specifically for QMC and are available in the literature[57, 58, 59], although more traditional Hartree-Fock or DFT pseudopotentials in the Troullier-Martins form work as well. 4.4 Forms of Wave function For chemical problems, the first-order trial function is usually written as a Slater determinant taken from Hartree-Fock or Density Functional Theory multiplied by a correlation factor(known as a Jastrow factor) which is optimized in Variational Monte Carlo. Between 90% and 95% of the correlation energy is typically obtained with this trial wave function in Diffusion Monte Carlo. One of the attractions of QMC is that, since all the integrals are done by Monte Carlo, almost any ansatz can be used, as long as it is reasonably quick to evaluate. QWalk s modular structure makes adding new wave function forms as simple as coding one-electron updates of the function value and derivatives, and adding one line to the main code to support loading of the module. We have implemented several forms of wave functions, which the user can combine. For example, to create the Slater-Jastrow wave function, the user first asks for a multiply object, which contains two wave function objects. The user

49 35 then fills in a Slater determinant object and a Jastrow object. For a Pfaffian-Jastrow wave function, the user replaces the Slater determinant input with the Pfaffian input. Obviously, it is up to the user to make sure that the total wave function is properly antisymmetric and represents the problem correctly Slater Determinant(s) where D ( ) i This is the standard sum of Slater determinants, written as Ψ T = c i D i D i, is a determinant of the spin up(down) one-particle orbitals. The coefficients are optionally optimizable within VMC Jastrow Factor The Jastrow factor is written as e U, where U = iik c ei k a k(r ii ) + c ee k b k(r ij ) + c eei klm [a k(r ii )a l (r ji ) + a k (r ji )a l (r ii )]b m (r ij ), (4.5) ijk ijiklm i, j are electron indices, and I is a nuclear index, and sums over these indices are implied. Both the coefficients and parameters within the basis functions can be optimized. For the basis functions, we satisfy the exact electron-electron cusp conditions with the function b(r) = cp(r/rcut)/(1 + γp(r/rcut)), where p(z) = z z 2 + z 3 /3, γ is the curvature, which is optimized, and c is the cusp(0.25 for like spins and 0.50 for unlike spins). Further correlation is added by including functions of the form b k (r) = a k (r) = 1 zpp(r/rcut) 1+βzpp(r/rcut) where zpp(x) = x 2 (6 8x + 3x 2 ) and β is an optimized parameter. These functions have some excellent properties, going smoothly to zero at a finite cutoff radius, and covering the entire functional space between 0 and rcut. This allows the Jastrow factor to be extremely compact, typically requiring optimization of around 25 parameters while still coming close to saturating the functional form. While these are the standard basis functions, they can be replaced or augmented by any in the program by a simple change to the Jastrow input. The third term in Eqn 4.5, which sums over two electron indices and ionic indices, can be expensive to evaluate for large systems and is sometimes excluded. A Jastrow factor with only the first two terms is called a two-body Jastrow, and with the eei term included is called a three-body Jastrow.

50 Pfaffian Pairing Wave Function Pairing wave functions with a Jastrow factor for molecules were first investigated by Casula and coworkers[60], who studied the constant number of particles projection of the BCS wave function. We write the wave function as Ψ T = e U detφ, where e U is the Jastrow factor of above and the matrix Φ ij = χ(r i, r j ) is the pairing function between opposite-spin electrons(the function is easily extended for N up N down ). This function contains the Slater determinant as a special case when χ is written as the sum over the occupied single-particle orbitals: χ(r i, r j ) = N e k φ k(r i )φ k (r j ). We have implemented the Pfaffian[52] pairing wave function, which allows not only unlike-spin pairing, as the canonical projection of the BCS wave function does, but also allows like-spin pairing. The wave function is written as the Pfaffian of the matrix P, appears as the following: P = pf ξ Φ ϕ Φ T ξ ϕ ϕ T ϕ T 0. (4.6) The Φ matrices are the same as in the BCS wave function, and the ϕ matrices are made up of the one-particle orbitals for a spin-polarized system. The ξ are antisymmetric triplet pairing matrices. The operation of the Pfaffian ensures that the entire wave function is antisymmetric. The Pfaffian wave function contains the BCS wave function as a special case without triplet pairing, and thus contains the Slater determinant wave function as well. The general expansion for Φ is Φ(r 1, r 2 ) = kl c kl φ k (r 1 )φ l (r 2 ) (4.7) under the constraint that c kl = c lk. ξ is written in a very similar way: ξ(r 1, r 2 ) = kl d kl φ k (r 1 )φ l (r 2 ) (4.8) under the constraint that d kl = d lk. The sum extends over the virtual space of orbitals. The Pfaffian wave function is not used in this dissertation, so for more information about the performance and implementation, see Ref [52] and references within.

51 One-particle orbital evaluation We provide two major ways of evaluating the one-particle orbitals, the most expensive part of the QMC calculation. For a single electron, this is the problem of finding m = M orb b, where m is a vector of the values of each orbital, Morb is the orbital coefficient matrix, and b is the vector of basis functions. The first (CUTOFF MO) is a linear scaling technique, which, for localized orbitals and large enough systems, will take O(N) time to evaluate all orbitals for all electrons. For each basis function, it creates a list of orbitals for which the coefficient is above a cutoff. This is done at the beginning of the calculation. Then, given an electron position, it loops over only the basis functions within range of the electron, and then only the orbitals contributed to by the basis function. These are both O(1) cost for large enough systems, so each electron is evaluated in O(1) time, giving O(N) scaling. The second method (BLAS MO) is slightly simpler. While it scales in principle as O(N 2 ), it can be faster than CUTOFF MO in medium-sized systems and certain types of computers that have very fast BLAS routines, such as Itaniums. Given an electron position, it loops through the basis functions within range of the electron, and adds to each molecular orbital the coefficient times the value of that basis function using fast BLAS routines. 4.6 Example calculation To give a feeling for the flow of the program, we will go through a simple calculation. A schematic of the procedure is given in Fig 4.3. The first two steps are to choose the system and use a one-particle code such as GAMESS or CRYSTAL to prepare the oneparticle orbitals, which is done as usual for the code. The converter program included with QWalk then creates the system, slater, and jastrow files automatically, so all the user must do is use the include directive to use them. In Fig 4.4, we evaluate the properties of the starting Slater wave function by creating 500 electron configurations, and then propagating them for 16 blocks, each of which consists of 10 Monte-Carlo steps with a time step of 1.0 a.u. The final set of configurations is then stored in configfile, and QWalk outputs the total energy and other properties that have been accumulated along the way. We then wish to obtain some correlation energy by adding the Jastrow factor(fig 4.5). The converter has already created a null Jastrow wave function, so we request a Slater-

52 38 Figure 4.3: Flow of a QMC calculation include sysfile #load the converted pseudo-nuclei, number of electrons #load the Slater determinant of one-particle orbitals(already converted) trialfunc { include slaterfile } # method { VMC nconfig 500 #number of configurations to generate nblock 16 #averaging blocks nstep 10 #steps to take per block timestep 1.0 #timestep to use(acceptance should be ~50%) storeconfig configfile #save sample points(configurations) in a file } Figure 4.4: Example input file for VMC evaluation of properties. This corresponds to the fourth box in Fig 4.3.

53 39 include sysfile #load system as usual trialfunc { slater-jastrow #a meta-wave function that multiplies two wave functions g wf1 { include slaterfile } #the Slater determinant again wf2 { include jastrowfile } #Jastrow correlation factor (created by converter) } method { OPTIMIZE nconfig 500 #number of sample points to use iterations 30 #optimization iterations readconfig configfile #configuration file from VMC above } Figure 4.5: Example input file for optimization of variational parameters. This is the fifth box in Fig 4.3. Jastrow wave function. The first wave function is the Slater determinant that we used before, and the second is the Jastrow created by the converter. We request optimization using a fixed set of walkers that we generated in the previous VMC run. Finally, we wish to evaluate properties of the new correlated wave function using the VMC routine(fig 4.6). This is the same as the last, except that we include the output wave function from the optimization in the trialfunc section. Also in the example, we perform a DMC calculation immediately after the VMC calculation. The input is basically identical for DMC, except that it requires a readconfig directive with the VMC configurations. 4.7 Other Utilities Conversion of One-particle Orbitals Currently, QWalk can import and use the orbitals from GAMESS[61](gaussian basis on molecules), CRYSTAL[62](gaussian basis for extended systems), and GP[63](plane waves for extended systems). More interfaces are planned.

54 40 include sysfile #load the wavefunction file generated by OPTIMIZE trialfunc { include optfile.wfout } #Same as above method { VMC nconfig 500 timestep 1.0 nstep 10 nblock 16 readconfig configfile storeconfig configfile } #perform DMC method { DMC nconfig 500 timestep 0.02 } nstep 50 nblock 16 readconfig configfile storeconfig configfile #smaller timestep because of the short-time Green s function #more steps per block because of the small timestep Figure 4.6: Example input file for evaluation of properties of the correlated wave function, plus a DMC calculation. This corresponds to the sixth and seventh block in Fig 4.3.

55 Plane Wave to LCAO converter Gaussian basis sets have been used in quantum chemistry for years, and there has been an industry around making them systematically convergable. They are localized, which improves the scaling of QMC, and allow a very compact expression of the one-particle orbitals, so less basis functions need to be calculated. Overall, a gaussian representation can improve the performance of the QMC code by orders of magnitude over the planewave representation. We have developed a simple method to do this conversion that is fast and accurate. We start with the plane-wave representation of the k-th orbital Φ k ( r) = G c kg e G ( r), and wish to find the LCAO equivalent Φ LCAO k ( r) = j a kjφ j ( r), where e G is a plane-wave function and φ j is a Gaussian function. Maximizing the overlap between Φ k and Φ LCAO k, we obtain Sa k = P c k, where S ij = φ i φ j and P i G = φ i e G. Then the Gaussian coefficients are given as a k = S 1 P c k. All the overlap integrals are easily written in terms of two-center integrals for S, and P is easily evaluated in terms of a shifted Gaussian integral. The limiting part of the conversion is the calculation of the inverse of S, which can be done with fast LAPACK routines. 4.8 Conclusion QWalk is a step forward in creating a fast, usable, and extendible program for performing Quantum Monte Carlo calculations on electronic systems. It is able to handle very large systems on the scale of Quantum Monte Carlo calculations; the maximum size is mostly limited by the available computer time. It scales very efficiently with more processors (Fig 4.8), so it can take advantage of large clusters, multi-core computers, etc.

56 Monte Carlo steps/second Number of processors Figure 4.7: Scaling of QWalk code over processors in Monte Carlo steps per second. The system is a 2x2x2 cell of BaTiO 3 with 320 electrons and one walker per node. This is VMC; DMC is very much the same, because of the constant walker algorithm. This is close to a worst-case scenario for QMC, since the run was only approximately 40 seconds per processor.

57 43 Chapter 5 Theory: Forces in Quantum Monte Carlo 5.1 Approach In QMC, the naïve Hellmann-Feynman estimator commonly used in mean-field calculations has infinite variance. Assaraf and Caffarel[16] and Chiesa et al.[14] have proposed methods to remove the spurious infinite variance part of the estimator, leaving one with finite variance. However, Hellmann-Feynman forces deteriorate in efficiency quickly as the atomic charge increases; the variance increases to the point that they are not useful for atoms beyond the first row[14]. There is also a Pulay correction[64] from the fixed node approximation, which may or may not be large, depending on the quality of the nodal surface. There is no known method for finding the Pulay correction for projector Monte Carlo. For these reasons, we did not use the Hellmann-Feynman approach to calculating the forces. Forces by finite differences in QMC are also challenging. We will label a primary system by a 1 somewhere in the variable and a secondary system by a 2. It is hopeless to evaluate the energy difference between these systems by simply doing two separate calculations and taking the difference, because the error (squared) is given as ɛ 2 E2 E1 = ɛ2 E2 + ɛ2 E2. Since each stochastic error will be on the scale of the total energy, the difference is generally

58 44 not resolvable. If instead the two calculations are linked so that the energies are correlated, then the error will be reduced by the covariance of the energies: ɛ 2 E2 E1 = ɛ 2 E2 + ɛ 2 E2 2cov(E1, E2), (5.1) which goes to zero in the limit of perfectly correlated random variables. In Variational Monte Carlo, this is a fairly simple procedure, and was explored by Umrigar[65] some time ago. The strategy is to sample a primary distribution Ψ 2 1 (R 1) and evaluate the energy of the system, and using the same samples evaluate the energy of a secondary system with a secondary wave function Ψ 2 2 (R 2), where R 2 = f(r 1 ) for some warping transformation f : R 3Ne R 3Ne. We will use the warping factor suggested in Ref [12] that warps each electron independently: I r (2) i = r (1) i + (r(2) I r (1) I I r k ii )r k ii, (5.2) where k is some integer around 1-4, i and I are electronic and ionic coordinates respectively, and the electron-ion distance r ii is measured in the primary system. The energy difference is then evaluated as E 2 E 1 = = = dr 2 Ψ 2 2(R 2 )E 2L (R 2 ) dr 1 Ψ 2 1(R 1 )E 1L (R 1 ) [ dr 1 Ψ 2 1(R 1 ) J(R 2 ) Ψ2 2 (R ] 2) Ψ 2 1 R E 2L (R 2 ) E 1L (R 1 ) 1 J(R 2 ) Ψ2 2 (R 2) Ψ 2 1 R E 2L (R 2 ) E 1L (R 1 ) 1 where J(R 2 ) is the Jacobian of the warping transformation that comes about by changing integration variables from R 2 to R 1. Ψ 2 1 Since the warping transformation is electron-byelectron, the total Jacobian J(R 2 ) is given as the product of the one-electron Jacobians: N e J(R 2 ) = J(r (2) i ). (5.3) In the simulation, then, we sample Ψ 2 1 as normal and accumulate the energy difference as a simple averaging variable equal to we 2L E 1L, where w(r) is the weight in the equation above. This variable has much lower variance than the local energy by itself, so the energy difference is calculable. This procedure does not involve an approximation beyond that of the variational wave function. i,

59 45 To move to projector Monte Carlo, we will work within the reptation Monte Carlo framework, since correlated sampling is conceptually easier. Recall that in RMC, the ground-state energy is given by averaging the local energy at the end point of the path over the distribution of paths: E 1 = By going through the same steps as VMC, we can find that where the weight is ds 1 Π(s 1 ) HΨ(R 1n) Ψ(R 1n ). (5.4) E 2 E 1 = we 2L E 1L Π(s1 ), (5.5) w(s 2 ) = Π 2(s 2 ) Π 1 (s 1 ) n J(R i ) (5.6) The Jacobian is easily calculable from the warping function, but the ratio between the probability distributions Π 2(s 2 ) Π 1 (s 1 ) is not known. i=0 requires some approximation, since the exact Green s function Filippi and Umrigar Approximation Filippi and Umrigar[12] developed an approximation to the Green s function inspired by the work of Reynolds et al.[66]. For a pair of two points on the path R 1 and R 1 (R 2 and R 2 ), the approximation is that J(R 2 ) G D (R 2 R 2, τ)ψ 2 2T (R 2) G D (R 1 R 1, τ)ψ 2 1T (R 1) Ψ2 2T (R 2 ) Ψ 2 (5.7) 1T (R 1 ). In words, the approximation is that if we ignore the branching part of the importancesampled Green s function, the dynamic Green s function samples Ψ 2 1T (R 1), and, the same will apply in the secondary system when connected by a proper space-warp transformation. This is not a bad approximation; the most efficient approximations to the Green s function for DMC[42] enforce this relationship by introducing a Metropolis acceptance/rejection step (see the discussion in Sec 3.2.1). The secondary Green s function is assumed to also generate Ψ 2 2T (R 2, when constrained to follow the same (warped) path as the primary one. This is where the approximation can fall apart. Note that there is also a small dependence on the space-warping function. One would expect that the better the trial wave function,

60 46 the better the Filippi and Umrigar (F-U) approximation. Nonetheless, the formula for the weights is very smooth and easy to calculate: w(s 2 ; s 1 ) = Ψ2 2T (R 2n) Ψ 2 1T (R 1n) J(R 2n) n exp( τ 2 (E 2L(R 2i ) + E 2L (R 2(i 1) ))) i=1 exp( τ 2 (E 1L(R 1i ) + E 1L (R 1(i 1) ))) (5.8) Pierleoni and Ceperley Approximation Pierleoni and Ceperley[43] have developed an alternative formulation for the averaging weights by symmetrizing the short-time Green s function (Eqn 3.8). This operation can be justified by writing the non-importance-sampled Green s function in terms of the importance-sampled Green s function G(R, R, τ) = G(R R, τ) G(R R, τ) (5.9) as can be easily verified by plugging into Eqn 3.8. We write the symmetrized short-time Green s function as follows: lng(r, R, τ) = τ 2 [ EL (R ) + E L (R) ] (5.10a) + τ 4 The F-U approximation contains only term (a). [ F 2 (R ) + F 2 (R) ] (5.10b) + (R R) 2 /2τ (5.10c) +.5(R R) [F (R ) F (R) ] (5.10d) While the F-U weights are relatively invariant to the time step and warping function, the P-C weights are quite sensitive, and can gain many factors of efficiency from a careful choice of simulation parameters. This can be understood by examining their Green s function approximation. Ideally, this difference of each of these terms with their counterpart in the primed system will have a very small variance. In the case of no space warping, i.e. R 2 = R 1, then (c) will be identically zero and most of the variance will come from (a) and (b), especially near the nuclei. Space warping shifts the electron distribution so that an electron close to the nucleus in the secondary system will also be close in the primary system. This has the effect of reducing the variance of (a) and (b) but increasing the variance of (c). Since the F-U approximation doesn t include (c), or in fact, anything except (a), its variance decreases significantly when spacewarping. The P-C Green s function approach, on the other hand, gains a large amount of

61 47 variance from (c). We can mitigate this by performing the simulation at as high a time step as possible and by using a smooth warping factor, setting k = 1 in Eqn 5.2. The final weighting factor is given by w(s 2 ; s 1 ) = Ψ 2T (R 20 )Ψ 2T (R 2n ) n i=1 G(R 2i, R 2(i 1), τ) Ψ 1T (R 10 )Ψ 1T (R 1n ) n i=1 G(R 1i, R 1(i 1), τ) n J(R 2i ) (5.11) While the P-C approach may seem more accurate than F-U, it is also an approximation, since the nodal condition may cause an inaccuracy in the Trotter expansion. This was noted by De Palo et al.[13], who attempted to solve the problem with a cutoff on the quantum force F in the importance-sampled Green s function. We cut off the force in the same way as they did, but as they noted, it is not clear that the expansion is still valid. i=0 5.2 Test systems The ultimate test of a force is conservation of energy over a path. The integration of the force over a path should equal the change in energy in the endpoints. That is, the relation X2 E(X 2 ) E(X 1 ) = F (X) ds (5.12) X 1 should hold. This is easier to check than finding the derivative of the energy at a point, since it is additive, and therefore the statistical errors do not dominate over the signal. Energy conservation is also the most stringent possible test for the forces, since any force that conserves energy is the correct force. Another option often used is the minimum energy geometry, which is a necessary but not sufficient condition to guarantee correct forces H 2 For the simple H 2 dimer, one would expect that both approximations should work well, since there are no nodes and the Slater-Jastrow wave function is quite good. In fact, we see that is the case, with a simple two-body Jastrow factor (Fig 5.1) N 2 We tested the approximations on the N 2 dimer, squeezed to a bond length of 0.95 Å. We benchmarked the correlated sampling methods by displacing the dimer by Å

62 48 Shifted energy (Hartree) Shifted energy (Hartree) Total energy P-C integrated Total energy F-U integrated Bond length (bohr) Figure 5.1: Conservation of energy for different methods of calculating the force on H 2. Both the F-U and P-C approximations are equally good in this case.

63 F-U (metropolis GF) P-C (metropolis GF) F-U (Runge-Kutta GF) P-C (Runge-Kutta GF) de (Hartrees) a Figure 5.2: Calculated energy difference as a function of the error in a Jastrow parameter. An accurate 3-body Jastrow factor was computed and one parameter was varied by multiplying it by a, so at a = 1, the accurate Jastrow is used. and calculating the energy difference. Both methods using two different Green s functions to generate the dynamics converged to statistically the same value (Fig 5.2), but the F-U method has much less error for a worse wave function. This is perhaps not surprising, because in chemical systems, the dynamics are difficult to expand into a Trotter series accurately, and the Metropolis acceptance/rejection step performs better than most[42]. This difference in performance is not academic, since with a two-body Jastrow factor, the one typically used in large systems and solids to save computer time, the P-C reweighting does not conserve energy in N 2 (Fig 5.3).

64 50 Total energy (Hartree) Total energy F-U P-C Bond Length (bohr) Figure 5.3: Test of conservation of energy on N 2 using a two-body Jastrow factor. The P-C reweighting does not obtain the correct RMC energy differences when integrated. With a 3-body Jastrow, they will obtain the same results, and with a worse Jastrow, they will both deviate from the total energy calculations, as shown in Fig 5.2.

65 51 Table 5.1: Summary of what methods work on which materials with which Jastrow factor. Situation F-U works? P-C works? H 2 (any Jastrow) yes yes N 2 (three-body Jastrow) yes yes N 2 (two-body Jastrow) yes no TiO (any Jastrow) no no TiO TiO is a more complicated system than N 2, as will be discussed in a later chapter. The important thing about it is that the Slater-Jastrow trial wave function is worse for TiO than for N 2. The variance of the local energy (which should be zero for an exact wave function) is larger, and the difference between the VMC and the DMC energy is larger. These are somewhat unfavorable conditions, since both the F-U and P-C methods made approximations that the trial wave function is accurate. In Fig 5.4, we see that again the F-U method is more robust to errors in the wave function. If we then proceed to calculate the bond length, even with an accurate three-body Jastrow factor, the F-U method obtains 1.601(3) Å, which underestimates the bond length obtained by the Bayesian energy method (Chapter 6), which is 1.617(3), in excellent agreement with the experimental value of For a two-body only function, the one used in periodic calculations, F-U is worse, obtaining 1.590(2) Å. P-C obtains a bond length more in line with the total energy calculation, but this agreement is probably fortuitous, since it is clearly more sensitive to the wave function quality. 5.3 Conclusions It appears that the F-U method is more robust to a poor trial wave function than the P-C full Green s function method. However, neither approximation to the Green s function is universally good, in that reasonable trial wave functions can cause the approximation to fail, as in the case of TiO (Table 5.1). We attempted to extend the correlated sampling method, but the results did not change significantly with resampling the proper Green s function within RMC or trying a higher order expansion for the Green s function. It appears that there are some divergences in the Green s function that, while they can be

66 F-U P-C 5e-05 0 de (Hartree) -5e a Figure 5.4: Energy difference as a function of the error in the Jastrow factor for the TiO molecule. This was done in the same way as in Fig 5.2. It is not possible to extrapolate to the correct answer since the error in the Jastrow factor is not a function of only one variable.

67 53 removed by cutoffs in the force when performing the stochastic process, make reweighting difficult. The method of Lattice Regularized DMC[45] may offer a way to improve this, since the Green s function is explicitly calculated in that method, and therefore should be known exactly. To date, this has not been tested, however. From this study, it appears that forces are very challenging within projector Monte Carlo. With the F-U approach to correlated sampling and an accurate trial wave function, forces for simple systems like N 2 are available. We have tested along one axis, and found that beyond the first and second row elements, forces using current methodology are inaccurate. We have not treated the other axis, of increasing the number of atoms in the simulation, and the forces may or may not be sufficiently accurate in that situation. This can in general be checked by calculating energy conservation along a path. For the transition metal oxides that are the focus of this dissertation, however, forces are unreliable because the trial wave functions are poor. A more subtle limitation of forces within QMC is that they have stochastic errors, which means that, at least for few-dimensional systems, finding the minimum energy is most accurately found by fitting the force to a line. On the other hand, within QMC, the total energies are well-known to be extremely reliable. In the next chapter, we will develop a stochastically correct way to use the energies with uncertainties to find the minimum energy geometry. Should forces become more reliable, the Bayesian method can use that information along with the total energy information. 5.4 Notes Derivative of the Energy with Respect to the Lattice Constant This section is a short note to those who may try to extend this work. To obtain the derivatives of the lattice constant of a periodic system, some care is necessary. If the normal warping as above is used, then there can be parts of the space in the secondary system that are not sampled. It is not an optimal warping anyway. To guarantee correct sampling of space, we should map the unit cells of the primary and secondary system in a one-to-one relationship. This is done by working with the matrix of lattice vectors L 1 = [x 1 ; x 2 ; x 3 ] (5.13)

68 54 where the x s are the lattice vectors of the primary system. The matrix of lattice vectors of the secondary system are then related by a linear transform: L 2 = ML 1, so M = L 2 L 1 1. Each electron in the secondary system is then related to its counterpart in the primary system by r 2i = Mr 1i, and the Jacobian for that electron is simply the determinant of M.

69 55 Chapter 6 Theory: Bayesian Fitting The main disadvantage of QMC methods is that every quantity has a statistical uncertainty which decreases only as the square root of the computer time. For quantities like bond lengths, researchers have historically calculated the energy at several bond lengths, then fitted a function to the points. Uncertainties have been calculated in many ways, but to our knowledge, none of them is exact and makes use of all the information available(including the statistical uncertainty). Here we offer a more systematic way of finding the minimum bond length along with its error bar. This approach has been in use for a while in other fields (see Ref [67] for a very similar piece of work), and has proven to be a very useful and robust way of estimating parameters when the signal to noise ratio is low. 6.1 Theory According to Bayes theorem, given a model M and a set of data D, the probability of the model given the set of data is P (M D) = P (D M)P (M). (6.1) P (D) P (D) is an unimportant normalization constant, P (M) is called the prior distribution, which we are free to set to reflect the a priori probability distribution on the set of models. Without any good reason to believe otherwise, we generally set P (M) = 1, the maximum

70 56 entropy/least knowledge condition. In the case of normally distributed data on a set of points {x 1, x 2,..., x N }, P (D M) exp[ i (M(x i ) D(x i )) 2 /2σ 2 (x i )], (6.2) where σ(x) is the statistical uncertainty of D(x). In the case of bond lengths, we can limit our space of models to M(x) = c 1 + c 2 x + c 3 x 2, for x close to the minimum bond length. This is equivalent to setting the prior distribution equal to one for all quadratic functions and to zero for non-quadratic functions. One then calculates several data points D(x) with statistical uncertainties σ(x). The probability distribution function of the bond length b is then obtained by evaluating the integral p(b) = δ( c2 /2c 3 b)p (D M)P (M)dc 1 dc 2 dc 3 P (D M)P (M)dc1 dc 2 dc 3. (6.3) This integral is only three-dimensional, and as such could be calculated by a grid method, but we found it convenient to calculate it by Monte Carlo, by sampling P (D M)P (M) and binning the bond length. The probability distribution function is a quite useful object, which gives some insight into which data points are useful. 6.2 Choosing Data Points When one has data without uncertainty, the minimum is found to highest precision by choosing some evenly distributed points in the general vicinity of the minimum and fitting a quadratic potential to it. When uncertainty appears, this changes significantly. To demonstrate this, we take a model function f(x) = x 2 2x and try to find the minimum. We generate data points as D(x i ) = f(x i ) + χ, where χ is a Gaussian random variable with standard deviation equal to 0.02 and calculate the probability distribution function of the minimum. In Fig 6.1, we have an intentionally bad method of choosing the set of data points, starting from one side and filling in a few data points at a time. This shows in the probability distribution function of the minimum, where a long tail appears on the right side. This is a signal that to reduce the uncertainty, there need to be more data points on the right side. If we ignore that signal and continue adding data points from left to right, it does eventually

71 57 f(x)+noise Probability (arb) x x Figure 6.1: Convergence of the minimum as we add data points starting from one side

72 58 f(x)+noise Probability (arb) x x Figure 6.2: Convergence of minimum as we add data points starting from both sides and filling in the middle. converge to a low-uncertainty distribution around the minimum, but only after the right side points are added. A much more efficient way to obtain the minimum is to start at both ends and the middle. In Fig 6.2, we have followed this strategy. Note that in the first pair of graphs, there are only three points, but the spread of the probability distribution function is about the same as the second side graph, which has seven data points. As we add more points in the middle, the spread decreases much more slowly. If we were to add more points on the ends, it would decrease much faster. The spread of the probability distribution function of the minimum is mostly determined by how far away from the minimum the points can be calculated, and not as much by how accurately we have described the area in the center. This is a result of the stochastic uncertainties being less important the faster the function

73 59 is moving, which is away from the minimum. In principle, the most efficient way to find the minimum would be to calculate data points very far from the minimum and obtain the quadratic function. However, potential energy surfaces are rarely well approximated by a quadratic function far away from a minimum, which leads us to look for more efficient functions. 6.3 Optimal Fitting Function There are two balancing factors in the search for a better fitting function. The first is the maximum range away from the minimum for which the function is accurate, which was explained in the last section. The second is the number of parameters. The fewer parameters, the better, since more parameters increase the space of the integrals and thus the width of the eventual probability distribution. To find a good function, we take as a model system the LDA potential energy of BaTiO 3 as a function of the cubic lattice constant. BaTiO 3 will be discussed later in the dissertation, but the properties we will discuss are fairly universal for chemical systems. This is a more traditional fit, without statistical uncertainties. We will consider three fitting functions: Quadratic: f(x) = c 0 + c 1 x + c 2 x 2 Cubic: f(x) = c 0 + c 1 x + c 2 x 2 + c 3 x 3 ( Morse: f(x) = c 0 + c 1 1 e c 2 (x c 3 ) ) 2. Another option is the Vignet function[68], which is very similar to the Morse, and has been used by Maezono et al.[69] for a more basic fitting procedure. As we can see in Fig 6.3, the Morse potential has by far the longest range of applicability, reproducing the correct minimum from radii of more than 0.6 Å. It is able to do this because it has the correct asymmetry of the underlying potential energy function (squeezing is much harder than stretching). On BaTiO 3, this increases the efficiency of the minimum energy calculation by about a factor of ten over a quadratic fitting function, despite the extra parameter.

74 60 4 Calculated minimum Quadratic (3 parameters) Cubic (4 parameters) Morse (4 parameters) Radius of points(angstrom) Figure 6.3: Calculated minimum energy lattice constant of BaTiO 3 by LDA for different fitting functions

75 Finding the optimal next data point The method presented in this section has not been used in my calculations, since for one dimension the previous analysis appears to be sufficient; i.e., the strategy of sampling the data points away from the minimum and fitting using an intelligent function is quite good. This method will predict the best set of data points for a general function and observable, and therefore is presented with an eye towards more complicated problems, with perhaps several dimensions or many parameters. Suppose that we have a set of data points, which give an expected value for some quantity q. We wish to calculate a new data point such that the variance of q, σq, 2 is minimized. We can evaluate the variance as σq 2 = (q q ) 2 P (M D)dM (6.4) where we have used dm as a general integral over all models (which will be the fitting variables), and D is the data which we already have. If we obtain a new data point d(x) with standard deviation σ(x), then the new variance will be σ 2 q = (q q ) 2 P (M D)Q(M d, x, σ(x))dm P (M D)Q(M d, x, σ(x))dm, (6.5) where Q(M d, x, σ(x)) = exp( (M(x) d(x)) 2 /2σ(x) 2 ) (6.6) represents the information added by the new calculation. In a QMC calculation, we can choose x, the new position at which to calculate and σ(x). d(x), of course, is unknown, but its probability distribution function is pdf d(x) (d x) = dmδ(d M(x))P (M D). (6.7) For a given x and σ(x) the expectation value is σq(x, 2 σ(x)) = 1 (q q ) 2 P (M D)pdf N d(x) (d x)q(m d, x, σ(x))dmdd, (6.8) where N = P (M D)pdf d(x) (d)q(m d, x, σ(x))dmdd (6.9) This is the quantity to be minimized with respect to x and σ(x).

76 62 What follows is a suggested way to evaluate this integral efficiently. Normally, in Monte Carlo, we would sample P (M D)pdf d(x) (d x)q(m d, x, σ(x)) (6.10) and evaluate the expectation value (q q ) 2, but this has the disadvantage that for every x and σ(x), we have to generate a whole new set of samples, since the distribution is correlated. We can instead work with the probability distribution function P (M D)pdf d(x) (d), (6.11) generate a set of configurations in the parameter space and then, since the probability distribution function is separable, generate the last dimension d separately according to pdf d(x) for each x and σ(x) to be evaluated. The variance of the quantity (which is what is being minimized) is then evaluated as the average over sample points σ 2 q(x, σ(x)) = i w i(q i q ) 2 i w, (6.12) i where the weights w i = Q(M i d i, x, σ(x)). The minimization can be done using the same samples for all x and σ(x), which correlates the estimates of σq, 2 so the differences are resolved much more precisely than the total value. This will allow for very efficient minimization routines with an arbitrary number of parameters. 6.5 Summary The Bayesian approach to analyzing noisy data is a powerful tool. It uses all available information correctly (see Ref [70] for a long discussion of Bayesian inference versus more traditional statistical methods) and provides the complete probability distribution functions of any desired variables, even for complicated nonlinear functions of the energy. One can also predict, given the available data, what the optimal next data points should be. It is a very general method that has applications in experimental work as well when there is noisy data.

77 63 Chapter 7 Application: Transition Metal Monoxide Molecules 7.1 Introduction The transition metal-oxygen bond is the driving force behind many perovskite materials, which have been noted as having significant errors in the unit cell volume within DFT[71, 72], while being too large for post Hartree-Fock methods to be applied. To go beyond DFT, benchmark calculations are particularly useful to determine what level of accuracy one can obtain from a given method; even though the precise bonding pattern can vary from system to system. Many authors (most recently, Refs [73, 74]) have studied the transition metal monoxides using Density Functional Theory and post-hartree-fock methods. The performance of these methods is less reliable than in s-p systems whenever transition metals are included in a system, so they exhibit a similar problem to the transition metal solids. In particular, the calculation of dipole moments is challenging because it is rather sensitive to the details of calculations and sizes of employed basis sets. In this chapter, we treat the first five TM-O molecules (Sc,Ti,V,Cr,and Mn), studying not only the binding energy, but also the bond length and the dipole moment. To obtain the dipole moment, we apply the relatively new Reptation Monte Carlo[20] method for the

78 64 first time to heavy elements. We find that for binding energy and bond lengths, QMC offers unmatched accuracy, while the dipole moment is in less agreement with experiment. We investigate the effect of the fixed-node condition on the dipole moment and find that while there is a significant nodal error, it is not enough to reconcile the calculation with the experiments Computational Parameters For the oxygen atom, we used the pseudopotential from Lester[57], and for the transition metals, we used Ne-core soft potentials from Lee[58]. To prepare the orbitals for the QMC calculation, we used GAMESS[61] with a triple-zeta basis set. Both RMC and DMC calculations were performed with τ = 0.01 Hartree 1, which was converged within error bars, and for our RMC calculations, we chose N = 301, which corresponds to a 3 Hartree 1 long projection length. We evaluate the dipole moment within RMC as the expectation value of µ = e i er i + µ nuclei, using the Hellmann-Feynman theorem. 7.2 Results and Discussion Energetics We begin with the importance of the one-particle orbitals used in the trial function. These are not optimized within VMC and the Jastrow factor does not change the nodes, so we are forced to use the nodes of the Slater determinant of orbitals from DFT or Hartree-Fock in the fixed node projector Monte Carlo calculation. For systems without strong electron correlation(for example, first and second row elements), it has been standard practice to use DFT and Hartree-Fock orbitals interchangeably, and the fixed-node energy is fairly insensitive to this. In TMO s, the correlation is much stronger and changes the structure of the one-particle orbitals by enhancing the d-p hybridization (Figs 7.1 and 7.2). That is, the orbitals that define the lowest energy nodal structure are significantly different from the Hartree-Fock orbitals. Direct optimization of the orbitals within QMC is desirable, but very difficult for larger systems, so we took the approach of finding an optimal mean-field that produces orbitals that minimize the fixed-node energy. In these systems, the hybrid functional B3LYP appears to be near-optimal[75]. In Fig 7.3, we report on the

79 65 Table 7.1: Binding energies of the first five transition metal monoxides by different theoretical methods, along with RMS deviations from the experiment(all in ev). Statistical uncertainties in units of 10 2 ev are shown in parentheses for Monte Carlo and experimental results. Zero point energy corrections are estimated to be much less than the size of the uncertainty in experiment. Method ScO TiO VO CrO MnO RMS LDA[73] CCSD(T)[74] TPSSh[73] DMC 7.06(3) 6.81(3) 6.54(3) 3.98(2) 3.66(3) 0.21 Exp[77] 7.01(12) 6.92(10) 6.44(20) 4.41(30) 3.83(8) 0 energy gain in DMC by using B3LYP orbitals. In our five molecules, there are roughly three levels of energy gain, corresponding to the type of bonding. ScO has only one d electron in a σ state, TiO and VO are respectively σ 1 δ 1 and σ 1 δ 2, and CrO and MnO are σ 1 δ 2 π 1 and σ 1 δ 2 π 2. Each new type of symmetry adds approximately 0.2 ev to the energy gain in using B3LYP orbitals, with a slight decrease for the half-filled shell of MnO. This energy gain is a measure of how poor the independent-electron approximation is for preparing the one-particle orbitals. Since there is almost no gain in the atomic systems by using B3LYP orbitals versus Hartree-Fock orbitals, the error in the energy does not cancel and shows up directly in the binding energy. The correct orbitals appear to be critical for high accuracy in the binding energy of TMO materials, even more so as more d-symmetry electrons are present. We compare our QMC results to Density Functional Theory in the LDA, Coupled Cluster with singles, doubles, and perturbative triples(ccsd(t)), and a new hybrid meta- GGA, TPSSh, which should be the most accurate semi-empirical DFT available[76]. Using accurate one-particle orbitals, DMC binding energies(table 7.1) all fall within experimental uncertainties except for CrO and MnO, which both have π-type electronic configurations. The RMS deviations of DMC are around 50% smaller than TPSSh and CCSD(T) at 0.21 ev. This is still above the systematic error of 0.05 ev that would be required for chemical accuracy ; however, the uncertainties of the experiments are also above this threshold. Table 7.2 shows the calculated versus experimental bond lengths for the selected methods. Here again, we see that DMC using a Slater determinant of B3LYP orbitals is quite accurate, with RMS deviation less than 0.01 Å, close to the size of our statistical

80 Figure 7.1: The d-p hybridization orbital (doubly occupied) for TiO in Hartree-Fock (top) and B3LYP (bottom). B3LYP enhances the hybridization significantly, which leads to lower energy in QMC. 66

81 Figure 7.2: The d-p hybridization orbital (doubly occupied) for MnO in Hartree-Fock (top) and B3LYP (bottom). B3LYP enhances the hybridization significantly, which leads to lower energy in QMC. 67

68 E(DMC, B3LYP orbitals)-e(dmc, HF orbitals) (ev)

1 0 ScO TiO VO CrO MnO σ 1 σ 1 δ 1 σ 1 δ 2 σ 1 δ 2

82 68 E(DMC, B3LYP orbitals)-e(dmc, HF orbitals) (ev) ScO TiO VO CrO MnO σ 1 σ 1 δ 1 σ 1 δ 2 σ 1 δ 2 π 1 σ 1 δ 2 π 2 σ symmetry δ symmetry π symmetry Figure 7.3: The energy gain in DMC from using B3LYP orbitals as a function of the metal monoxide. The line is a guide to the eye.

83 69 Table 7.2: Bond lengths in Å. The statistical uncertainties for ScO,TiO,VO,CrO, and MnO are respectively 0.002,0.003,0.003,0.004, and Method ScO TiO VO CrO MnO RMS LDA[73] CCSD(T)[74] TPSSh[73] DMC Exp[77] Table 7.3: Dipole moments in Debye. The fixed-node RMC results have been obtained with a single determinant of B3LYP orbitals. See text for an analysis of the errors involved for the case of TiO. Method ScO TiO VO CrO MnO LDA[73] CCSD(T)[74] TPSSh[73] RMC 4.61(5) 4.11(5) 4.64(5) 4.76(4) 5.3(1) Exp[78] (1)[79] error. It is again around 50% more accurate than the high-accuracy methods CCSD(T) and TPSSh, and four times more accurate than the LDA on these systems, on average Dipole Moment The dipole moment of these molecules has been noted as a difficult quantity to accurately calculate[73, 74]. In approaches relying on basis functions, there appears to be a large sensitivity to quality of the basis set. It also appears to be very sensitive to an accurate treatment of the correlation. The RMC method depends only very weakly on the basis set used to prepare the orbitals, and reaches the lowest energy of any variational method on these systems, so one may hope that RMC agrees with experiment better than other ab initio methods. We find this not to be the case. As shown in Table 7.3, we find serious disagreement with experiment in three of the four molecules with experiments available. Only ScO, the molecule with the weakest d-character, agrees well. The rest are universally predicted to be much higher in QMC. The significant discrepancies from experiment are surprising given the excellent

84 70 agreement that we obtained with energies. It also seems strange that the LDA is generally quite close to the experiment, since we know that for energetics it performs relatively poorly. We analyze the errors present in the calculations as follows for the case of TiO, the simplest of the molecules with a large difference from experiment. Pseudopotential error. We checked the pseudopotential in mean-field calculations, and it caused an overestimation of the dipole moment in TiO by 0.1 Debye. This is systematic for all five materials studied, with each having an overestimation of between 0.1 and Hellmann-Feynman theorem. The definition of dipole moment is µ = d H de, where E is the electric field. We have used the Hellmann-Feynman theorem to instead evaluate it as µ = i er i + µ nuclei. As shown in Ref [64], the Hellmann-Feynman theorem for calculating the dipole moment does not exactly apply in fixed-node Quantum Monte Carlo, although the errors are expected to be extremely small. To check this, we calculated the dipole moment using the finite field approach and correlated sampling[12] using nodes from B3LYP also at that field, and obtained the same result as the Hellmann-Feynman estimator within error bars. Fixed-node error and localization. The only remaining approximation in our simulation is the combined fixed-node error and localization error. this by attempting to systematically improve the trial wave function. We investigate To go beyond the Slater-Jastrow form, we began to add more determinants into the trial function for the test case of TiO. We performed a configuration interaction calculation including single and double excitations starting from the B3LYP orbitals, kept the determinants with the highest weights, and reoptimized the weights in the presence of the Jastrow factor. If we did not reoptimize the weights, we found that the fixed-node energy actually increases, suggesting that the correlation present in the Jastrow factor is critical to an accurate description of these materials. The Jastrow factor also reduces the number of determinants necessary to describe the electron correlation by several orders of magnitude, since the Jastrow factor contains most of the so-called dynamic correlation. Finally, we used an RMC simulation to find the dipole moment using the Hellmann-Feynman theorem. Fig 7.4 shows significant improvements in the energy on the order of.015 Hartrees, and the dipole moment appears to oscillate around a value of approximately 3.9(1) Debye. Clearly

85 Number of Determinants DMC energy(hartrees) Dipole moment(debye) Figure 7.4: The number of determinants versus the energy and dipole moment for TiO. The dipole moments are shifted downwards by 0.1 Debye to correct for the pseudopotential error. the high accuracy of the binding energy is from cancellation of errors in the atom and the molecule, as is typical in variational techniques, and we are still far from the exact solution. Therefore, we estimate the error in dipole moment to be approximately 0.2 Debye from the fixed-node approximation and 0.1 Debye from the pseudopotential approximation. With these corrections, the minimum value of the dipole moment for TiO is 3.6 Debye with over 95% confidence. This is consistent with the CCSD(T) number, but significantly larger than the value reported by experiment.

86 Conclusions We have found that for energetics, DMC using a single determinant trial function is remarkably accurate, perhaps suggesting that for these materials it is sufficient. The bond lengths and binding energies are, on average, 50% better than the best meta-gga and CCSD(T). The Bayesian method for finding the minimum bond lengths mitigates the inconvenience of statistical uncertainty, while improving the performance by using all possible information. The dipole moment appears to be more challenging, and requires a complicated treatment of the wave function nodes to obtain a stable value with respect to changes in the nodes. We have obtained a converged value for TiO, however, and it is still somewhat higher than suggested by experiment. This is in line with the large values for the dipole moment obtained by CCSD(T) and B3LYP; agreement with CC method is particularly reassuring. In addition, there are to our knowledge only two experimental measurements of the dipole moment[79, 80] from the same group, which report significantly different moments(2.96 versus 3.34 Debye). This may indicate a sizeable uncertainty in the experiment as well. The apparent agreement of LDA is almost certainly fortuitous, because LDA underestimates the bond length, which in turn causes the dipole moment to be too small. In fact, one would generally expect LDA to underestimate the dipole moment even at the correct bond length, since it tends to make the molecule not ionic enough. This may also indicate an inaccuracy in the experiment. One would expect that the corrections for the fixed-node approximation for VO, CrO, and MnO are similar or slightly greater than TiO, and therefore are around Debye. The oscillatory convergence that we see in the dipole moment is also seen in Configuration Interaction-only calculations, so it is not simply a problem with our stochastic optimization routines. Since the dipole moment is a very small perturbation on top of the total density, different partial summations of electronic correlation will emphasize different electronic configurations, which will in turn change the one-body density slightly. The dipole moment remains an extraordinarily sensitive quantity that is a stringent test of theory and experiment. It may be interesting to see if there are other wave functions that can describe the nodal surface to sufficient accuracy while being more compact than a large determinantal expansion.

87 73 Chapter 8 Application: Perovskite Crystals The class of perovskite crystals is very versatile, exhibiting ferromagneticity, ferroelectricity, and extremely high dielectric constants. Perovskites are all of the form ABO 3, where at least one of A and B is a transition metal. We chose to study ferroelectric perovskites, starting with the classic ferroelectric BaTiO 3, one of the most-studied ferroelectrics. The goal is to see if the high accuracy that QMC obtains for the transition metal molecules also applies to more complicated materials and solids. We will then study the less understood and more challenging system BiFeO 3 with the goal of understanding the somewhat ambiguous experiments on that material. 8.1 BaTiO 3 The cubic crystal structure of BaTiO 3 is presented in Fig 8.1. The tetragonal ferroelectric distortion occurs qualitatively when the Ti atom moves up in the cell towards an oxygen atom, creating a covalent bond at the expense of the bonds it has formed with the rest of the oxygens. Charge moves from being evenly distributed around the cell to being more concentrated on the Ti-O bond, creating a net polarization (see Figs 8.2 and 8.3). The effect is ferroelectric because once the top oxygen has bonded with the Ti, it will not bond with Ti above it, creating a tendency for all the Ti-O bonds to go in the same direction. While a solid will generally be sectioned into randomly oriented domains

88 74 Figure 8.1: BaTiO 3 structure. The titanium is the central atom, with barium at the corners and oxygen on the faces. of parallel cells, an electric field can align them all, and they will stay in alignment until otherwise disturbed. The energy gain of the change from a centrosymmetric to a polarized state is on the order of 20 mev (230 K) per formula cell, which is extremely small on the scale of chemical bonding. If the lattice constant is too small, then it is not energetically favorable for the Ti to move up, because then it will be too close to the oxygen to form a bond. Since the potential energy surface is so smooth, even an underestimation of the cubic lattice constant by 2% can remove the ferroelectric effect completely. On the other hand, an overestimation of the lattice constant can significantly enhance the effect[81, 82]. This sensitivity has been exploited in thin films by growing BaTiO 3 on a lattice-mismatched substrate, which can enhance the polarization attained[83, 84]. As a benchmark case, we will study the cubic lattice constant of BaTiO 3. This has the benefit of being well characterized experimentally and fairly simple. While a more complete study is desirable, the computational cost is too high to study more than one degree of freedom within QMC. Within DFT, LDA underestimates the lattice constant by about 1%. Supposedly more accurate GGA functionals typically do worse, overestimating the lattice constant by at least 1%[71]. Most researchers have overcome this by constraining the volume to be the same as experiment. With this modification, DFT agrees quite well with the experimental values for the ferroelectric distortion, as well as the polarization. While this can be viewed as a success, fixing the lattice constant to experiment represents a step back from true first-principles calculations, moving to a more heuristic approach. Practically as well, as interest in thin-film ferroelectrics increases, the experimental lattice constant may not be known for the thin films on the substrate, which makes calculations

89 75 using DFT uncertain. While there have been attempts at creating a GGA that obtains accurate results in ferroelectrics[85], there are in fact many possible formulations of GGA [86], none of which is clearly optimal from first principles. Thus, the lattice constant is a particularly stringent test of a method in addition to being sorely needed for a truly first-principles description of BaTiO Computational Parameters We used a 2x2x2 supercell consisting of 40 atoms with a 2x2x2 k-point mesh for all calculations. The finite size errors were checked with a 3x3x3 cell, as shown in Fig 3.2. For the oxygen pseudopotential, we used Lester[57] pseudopotentials, for Ti and Fe Y. Lee s[58], and for Ba we used the Stuttgart potential[87] modified to remove the Z r divergence in the potential energy. This potential is given in Table B.1. For Bi, we used the 23-electron Stuttgart potential[88]. In the DMC calculations, we used a time step of 0.02 Hartrees 1 for BaTiO 3, and 0.01 for BiFeO 3, both of which were converged within error bars. 8.3 Optimal One-Particle Orbitals and Binding Energy We used the CRYSTAL package[62] to prepare the one-particle orbitals using both Hartree-Fock and LDA. LDA orbitals were used because B3LYP is not very accurate for solids. BaTiO 3 behaves very similarly to the TMO molecules, with the LDA orbitals coming out with significantly lower energy (Table 8.1). The reasons are also similar, with the LDA tending to create more hybridization between the metal and oxygen atoms (Figs 8.2 and 8.3). This seems to be more representative of the true correlated wave function, since the Reptation Monte Carlo density is not significantly different from the LDA one. 8.4 Lattice Constant The raw data from the DMC simulations is given in Table 8.2 and the fitting for this data is in Fig 8.4. There are three major corrections to this data: finite size error, zero point energy, and a dynamical effect from the fact that the cubic phase is not stable at zero

90 Figure 8.2: BaTiO 3 one particle density isosurface using LDA(red or light grey) and Hartree- Fock(blue or dark grey) in the cubic phase. The cell is shifted so the bonds are easier to see. The titanium atom is the visible atom in the upper right. The isosurface value is set so that the surfaces overlap on the oxygen atoms. 76

77 Figure 8.3: BaTiO 3 one particle density isosurface in the tetragonal phase. The bonds with all but one oxygen are significantly reduced, and the system becomes very similar to the TiO molecule.

91 77 Figure 8.3: BaTiO 3 one particle density isosurface in the tetragonal phase. The bonds with all but one oxygen are significantly reduced, and the system becomes very similar to the TiO molecule. The difference between DFT and Hartree-Fock is the same as was in TiO, with DFT-LDA (red or light grey) enhancing the hybridization over Hartree-Fock (blue or dark grey). Table 8.1: Cohesive energy of BaTiO 3 by various first principles methods. All theoretical calculations are shifted by a 0.13 ev zero point energy correction. Method Cohesive energy/cell (ev) Hartree-Fock DMC(HF orbitals) 30.0(3) DMC(LDA orbitals) 31.2(3) Experiment 31.57[89] PW-91(GGA DFT) LDA 44.14

92 78 Table 8.2: DMC data for the energy of BaTiO 3 as a function of lattice constant.(2x2x2 cell, averaging over 4 real k-points) Lattice constant (Å) Energy/cell (Hartrees) Stochastic error temperature Finite Simulation Cell Error The finite size error is proportional to 1 N e in the energy. Following Maezono et al[69], the error can be partially removed by extrapolating the energy as a function of the lattice constant a as E (a) NE N(a) ME M (a). (8.1) N M For a 1x1x1 cell (M=40), the minimum energy lattice constant is 4.02 Å, and for a 2x2x2 cell (N=320) it is Since we are just interested in the order of magnitude effect, we use the maximum likelihood fits to correct for the finite size error. Doing so, we obtain a correction of Å. Performing a full Bayesian analysis is not necessary for such a small correction, because we are only interested in a reasonable approximation to it Zero-Point Energy Using the Quantum-Espresso package[90] and Vanderbilt Ultra-Soft Pseudopotentials [91], we calculated the gamma-point phonon frequencies as a function of lattice constant within LDA (Fig 8.4.3). The three soft modes were counted as zero for the zero-point energy, since their zero-point energy is very small. Fitting quadratic functions to the LDA

93 b (Hartrees) a 0 (angstroms) c (1/angstroms) E 0 (Hartrees) b c a 0 E (11) H 1.047(76) Å (44) Å (1) H Figure 8.4: Parameters for the fitting function E(a) = b(1 exp( c(a a 0 ))) 2 + E 0 from a 2x2x2 supercell calculation of BaTiO 3, along with the mean and standard deviation of the gaussian fits.

94 80 potential energy surface, we obtain minimums of Å for the unshifted curve and accounting for the zero-point energy. Therefore, there is a small correction of Å. This basically cancels out the finite size error for the 2x2x2 cell. We also obtain the correction to the binding energy of Hartrees/cell=0.13 ev/formula cell Lattice Dynamics While in the cubic phase, the system does not spend much time in the fully symmetric configuration, with Ti in the center. Particularly near the transition temperature, the atoms tend to stay more in the wells of the soft mode, moving between the 6 equivalent configurations[92]. We can estimate the order of magnitude of this effect by calculating the lattice constant as a function of the soft mode distortion(fig 8.6), and averaging over the possible atomic configurations. Doing this results in a correction of Å in the lattice constant. Our best estimate for the minimum energy lattice constant is therefore 3.975(2x2x2 cell) (finite size error)+0.005(zero-point energy)+0.006(dynamic effects) = ± Å. For comparison, the experimental lattice constant, extrapolated to zero temperature, is Å, so the DMC error is between 0.01 and 0.02 Å. This is quite good agreement, considering that we made both the fixed-node and pseudopotential approximations, and is indicative of the universality of Quantum Monte Carlo, that the same method obtains roughly the same accuracy on molecules and solids. 8.5 BiFeO 3 BiFeO 3 is material that exhibits multiferroicity simultaneous antiferromagnetism and ferroelectricity. It was discovered in 1957[93] and showed to be antiferromagnetic[94] shortly after. There existed some controversy over whether it was also ferroelectric, when Teague et al.[95] measured a small spontaneous polarization of 6.1 µc/cm 2 in the 111 direction in the rhombohedral space group R3c. Teague noted at that time that the coercive field (the electric field necessary to switch the ferroelectric) was quite large, and was unable to saturate the hysteresis curves. BiFeO 3 was not much studied until it came back in a flurry of interest in multi-

95 ZPE (Hartree/cell) Cubic lattice constant (Angstrom) Figure 8.5: Zero point energy correction for BaTiO 3

82 0.02 0.015 0.01 0.005-131.708 0-131.7085-131.709-131.7095-131.71 0 0.02 0.04 0.06 0.08 Ba-Ti distance (Angstroms) Figure 8.

96 Ba-Ti distance (Angstroms) Figure 8.6: Dynamic shift in the lattice constant for BaTiO 3 along with the energy per cell, as a function of the soft-mode distortion. This is calculated in LDA using Crystal Shift (Angstroms) Energy (Hartree) Figure 8.7: Formula unit cell of BiFeO 3 in the ferroelectric phase, with Fe in the center and Bi at the corners. The oxygen cage is rotated and the cell is slightly distorted from the cubic phase. The antiferromagnetic ordering is in the 111 direction.

Quantum Monte Carlo methods

Quantum Monte Carlo methods Lubos Mitas North Carolina State University Urbana, August 2006 Lubos_Mitas@ncsu.edu H= 1 2 i i 2 i, I Z I r ii i j 1 r ij E ion ion H r 1, r 2,... =E r 1, r 2,... - ground