Advanced Molecular Science: Electronic Structure Theory. Abstract

Size: px

Start display at page:

Download "Advanced Molecular Science: Electronic Structure Theory. Abstract"

Charlene Watson
5 years ago
Views:

1 Advanced Molecular Science: Electronic Structure Theory Krzysztof Szalewicz et al. Department of Physics and Astronomy, University of Delaware, Newark, DE 19716, USA (Dated: December 17, 2017) Abstract These Lecture Notes were prepared during a one-semester course at the University of Delaware. Some lectures were given by students and the corresponding notes were also prepared by students. The goal of this course was to cover the material from first principles, assuming only the knowledge of standard quantum mechanics at advanced undergraduate level. Thus, all the concepts are defined and all theorems are proved. There is some amount of material looking ahead which is not proved, it should be obvious from the context. About 95% of the material given in the notes was actually presented in the class, in the traditional blackboard and chalk manner. 1

2 CONTENTS I. Introduction 5 A. Spinorbitals 6 B. Products of complete basis sets 7 II. Symmetries of many-particle functions 7 A. Symmetric group 8 B. Determinant 10 III. Separation of nuclear and electronic motion 12 A. Hamiltonian in relative coordinates 13 B. Born-Oppenheimer approximation 15 C. Adiabatic approximation and nonadiabatic correction 16 IV. The independent-particle model: the Hartree-Fock method 18 A. Slater determinant and antisymmetrizer 19 B. Slater-Condon rules 21 C. Derivation of Hartree-Fock equations 22 V. Second-quantization formalism 27 A. Annihilation and creation operators 27 B. Products and commutators of operators 28 C. Hamiltonian and number operator 30 D. Normal products and Wick s theorem Normal-Product Contractions (Pairings) Time-independent Wick s theorem Outline of proof of Wick s theorem Comprehensive proof of Wick s theorem Particle-hole formalism Normal products and Wick s theorem relative to the Fermi vacuum Generalized Wick s theorem Normal-product form of operators with respect to Fermi s vaccum 40 VI. Density-functional theory 41 A. Thomas-Fermi-Dirac method 42 B. Hohenberg-Kohn theorems 47 C. Kohn-Sham method 49 D. Local density approximation 53 E. Generalized gradient approximations (GGA) 55 2

3 F. Beyond GGA 57 VII. Variational Method 58 A. Configuration Interaction (CI) method Size extensivity of CI MCSCF, CASSCF, RASSCF, and MRCI 62 B. Basis sets and basis set convergence 64 C. Explicitly-correlated methods Coulomb cusp Hylleraas function Slater geminals Explicitly-correlated Gaussian functions 69 VIII. Many-body perturbation theory (MBPT) 70 A. Rayleigh Schrödinger perturbation theory (classical derivation) 70 B. Hylleraas variation principle 74 C. Møller-Plesset perturbation theory 76 D. Diagrammatic expansions for MPPT Diagrammatic notation One-particle operator Two-particle operators Hugenholtz diagrams Antisymmetrized Goldstone diagrams Diagrammatic representation of RSPT 87 E. Time versions Time version of the first kind Time version of the second kind 88 F. Connected and disconnected diagrams 89 G. Linked and unlinked diagrams 90 H. Factorization lemma (Frantz and Mills) 92 I. Linked-cluster theorem 94 J. Removal of spin 96 IX. Coupled cluster theory 97 A. Exponential ansatz 97 B. Size consistency 98 C. CC method with double excitations 99 D. Equivalence of CC and MBPT theory 108 E. Noniterative triple excitations correction 110 F. Full triple and higher excitations 113 3

4 X. Linear response theory 114 A. Response function Density-density response function Calculation of properties from response functions 119 B. Linear response in CC approach CC equations Hellmann-Feynman theorem Linear response CC for static perturbation Lambda equations 126 XI. Treatment of excited states 128 A. Excitation energies from TD-DFT 128 B. Limitations of single-reference CC metods 131 C. The equation-of-motion coupled-cluster method 131 D. Multireference coupled-cluster methods 133 XII. Intermolecular interactions 136 A. Symmetry-adapted perturbation theory 137 B. Asymptotic expansion of interaction energy 141 C. Intermolecular interactions in DFT 142 XIII. Diffusion Monte Carlo 143 XIV. Density-matrix approaches 146 A. Reduced density matrices 148 B. Spinless density matrices 150 C. N-representability 151 D. Density matrix functional theory 154 E. Contracted Schrödinger equation 156 XV. Density matrix renormalization group (DMRG) 159 A. Singular value decomposition 159 B. SVD applications 161 C. DMRG wave function 162 D. Expectation values and diagrammatic notation 163 E. Matrix product ansatz 165 F. DMRG algorithm 166 G. DMRG in practice 167 H. Dynamic correlation and excited states 167 I. Applications to atoms and molecules 167 J. Limitations 168 4

5 I. INTRODUCTION The subject of these lecture notes will be methods of solving Schödinger s equation for atoms, molecules, biomolecular aggregates, and solids. Schrödinger s equation provides very accurate description of most types of matter under most conditions, where by most" we will understands the materials and conditions on the Earth. Exceptions include materials that include heavy atoms where relativistic effects have to be accounted for and high-precision measurements where not only the relativistic but also quantum electrodynamics (QED) effects play a role. We will restrict our attention to solutions of Schrödinger s equation. Incorporation of relativistic and QED effects can be achieved by a fairly straightforward extension of the methods discussed here. We will also restrict our attention to systems built of electrons and nuclei treated as point particles. Thus, we will not consider phenomena which involve nuclear reactions. However, many of the methods discussed here are used in theoretical nuclear physics. For systems with up to 4 electrons and 1 to 3 nuclei, one can now solve Schrödinger s equation almost to any desired precision, although for the most complicated systems of this type it requires huge amounts of computer resources. Some of the methods used in such calculations, such as the variational method with explicitly correlated functions (i.e., functions depending explicitly on electron-electron distances), will be briefly discussed here, but we will devote most of the time to systems that are larger and for which such methods are not applicable. The difficulty of solving Schrödinger s equation for systems with 5 and more electrons originates from dimensionality of the problem. For example, the benzene molecule contains 12 nuclei and 42 electrons, so that Schrödinger s equation for this system is 162-dimensional. Thus, this equation can be solved only by making approximations (although quantum diffusion Monte Carlo methods which will be considered later on do solve such equations almost" directly). The main approximation applied is many-particle (or many-electron or many-body) expansion. Therefore, most of the material covered here belongs to the branch of physics called many-body physics. The concept of many-particle expansion is based on the observation that in a many-particle system the most important interactions are those involving only two particles. This leads to several method hierarchical in the number of particle interaction considered. The particles that we will consider almost almost exclusively will be electrons. Manyparticle theories applied to bound states of such particles are known as electronic structure theory. The reasons for using the word structure" are uncleared, but probably relates to shell structure of atoms and orbital picture of molecules. 5

6 A. Spinorbitals Electrons are fermions of spin 1 2. The wave function of a single electron depends on the space coordinate r = [x,y,z] with each variable in the range (, ) and on the spin coordinate s which takes only values ± 1 2. Therefore, the wave function for a single electron, called spinorbital, can be written in the form of the so-called spinor = [ ] ψ+ (r) ψ(r,s) ψ(x). ψ (r) where the ψ + component is the amplitude of finding the electron with spin projection, the eigenvalue of the operator S z, equal to /2 ( spin up") whereas the ψ component which is the amplitude of finding this particle with spin projection /2 ( spin down"). Note that an electron in the state is in general in a mixed spin state. It will be more convenient to use the other form of the wave function shown in the eqution above ψ + (r) = ψ(r, 1 2 ) ψ (r) = ψ(r, 1 2 ). This form is particularly convenient to use in the expectation values of operators (matrix elements) involving spinorbitals. In the one-electron case ψ f φ = d 3 rψ(x)f (r)φ(x), s= 1/2,1/2 i.e., we sum over spin variable and integrate over the three space variables. Since f does not contain spin operators, the sum over the spin degrees of freedom can be computed immediately. In most cases we will consider pure spin states, i.e., states with the property that either ψ + (r) or ψ (r) is zero. For example, [ ] ψ+ (r) = 0 is the spinorbital with spin projections /2 or spin alpha (α)" state, whereas the other option is called spin beta (β)" state. In such cases, the pure-spin spinorbitals can be denoted as ψ + (r) and ψ (r) or ψ(r)α(s) and ψ (r)β(s) or ψ α (r) and ψ β (r), where α(1/2) = β( 1/2) = 1 and α( 1/2) = β(1/2) = 0 and the spatial part is called the orbital. Note the ψ + (r) and ψ (r) are now different spinorbitals, whereas before they were components of a single spinorbital. We also continue using the symbol ψ(x) 6

7 assuming that ψ(x) s=1/2 either describes pure spin alpha or is zero. One sometimes uses a somewhat confusing notation where spinorbital and orbital are denoted by the same symbol, so that we have, for example, ψ(x) = ψ(r)α(s). The meaning of this symbol should be obvious from the context. If ψ and φ represent the same spin projection, they are simultaneously nonzero at either s = 1/2 or 1/2 and zero at the opposite value, so that the spin summation can be performed and leaves only the spatial integral. If spin projections are opposite, at each value of s one of the spinorbitals is zero, so that the result of summation over spins is zero. The same becomes more transparent in the alpha/beta notation, for example, ψα f φα = α 2 d 3 rψ(r)f (r)φ(r) = d 3 r ψ(r)f (r)φ(r) s= 1/2,1/2 and obviously for opposite spins one gets zero. B. Products of complete basis sets One of main theorems used in the many-body theory tells that a complete basis set in the space of a many-particle functions can be formed as a product of complete basis sets of single-particle functions. Let us show that this is the case on the simplest example of a function of two variables. Lets assume that {g i (x)} i=1 is a complete basis set of one variable. Then the set of products {g i (x)g j (y)} is a complete set in the space of two variables, i.e., any functions f (x,y) can be expanded in this set f (x,y) = c ij g i (x)g j (y). i,j=1 To see that this is indeed the case, consider the function f (x,y) at a fixed value of x denoted by x 0. Since f (x 0,y) is just a function of a single variable, we may write f (x 0,y) = d j (x 0 )g j (y). j However, taken at different values of x 0, d j (x 0 ) is just a single-variable function and can be expanded in our basis d j (x) = c ij g i (x) which proves the theorem. i II. SYMMETRIES OF MANY-PARTICLE FUNCTIONS Since electrons are fermions, the electronic wave functions have to be antisymmetric. This chapter will show how to achieve this goal. The notion of antisymmetry is related 7

8 to permutations of electrons coordinates. Therefore we will start with the discussion of the permutation group. A. Symmetric group The permutation group, known also under the name of symmetric group, it the group of all operations on a set of N distinct objects that order the objects in all possible ways. The group is denoted as S N (we will show that this is a group below). We will call these operations permutations and denote them by symbol σ i. For a set consisting of numbers 1, 2,..., N, the permutation σ i orders these numbers in such a way that k is at jth position. Often a better way of looking at permutations is to say that permutations are all mappings of the set 1, 2,..., N onto itself: σ i (k) = j, where j has to go over all elements. The number of permutations is N! Indeed, we can first place each object at positions 1, so there are N possible placements. For each case, we can place one of the remaining N 1 objects at the second positions, so that the number of possible arrangements is now N(N 1). Continuing in this way, we prove the theorem. For three numbers: 1, 2, 3, there are the following 3! = 6 arrangements: 123, 132, 213, 231, 312, 321. One can use the following matrix" to denote permutations: ( ) k... N σ = σ(1) σ(2)... σ(k)... σ(n) The order of columns in the matrix above is convenient, but note that if the columns were ordered differently, this will still be the same permutation. An example of a permutation in this notation is ( ) σ = We define the operation of multiplication within the set of permutations as (σ σ )(k) = σ(σ (k)). For example, if then σ 1 = ( ) σ 2 σ 1 = σ 2 = ( ) ( ) We can now check if these operations satisfy the group postulates Closure: σ σ S N. The proof is obvious since the product of permutations gives a number from the set, therefore is a permutation. 8

9 Existence of unity I: this is the permutation σ(k) = k. Existence of inverse, i.e., for each σ there exists σ 1 such that σ σ 1 = I. Clearly, the inverse can be defined such that if σ(k) = j, then σ 1 (j) = k. Multiplications are associative: Proof is in a homework problem. σ 3 (σ 2 σ 1 ) = (σ 3 σ 2 ) σ 1. One important theorem resulting from these definitions is that the set of products of a single permutation with all elements of S N is equal to S N σ S N = S N. Proof: Due to closure, the only possibility of not reproducing the whole group is that two different elements of S N are mapped by σ onto the same element: σ σ = σ = σ σ. Multiplying this equation by σ 1, we get σ = σ which contradicts our assumption. Another theorem states that {σ 1 } = S N. This is equivalent to saying that σ and σ 1 are in one-to-one correspondence. Indeed, assume that there are two permutations that are inverse to σ: σ 1 σ = I = σ 2 σ. Multiplying this by σ 1 from the right, we get that σ 1 = σ 2. One important property of permutation is that each permutation can be written as a product of the simplest possible permutations called transpositions. A transposition is a permutation involving only two elements: σ(i) = j τ = τ ij = (ij) = σ(j) = i σ(k) = k for k i,j = ( ) i... j... N j... i... N To prove that any permutation can be written as a product of transpositions, we just construct such a product. For a permutation σ written as σ = ( k... N i 1 i 2... i k... i N first find i 1 in the set {1,2,...,N} and then transpose it with 1 (unless i 1 = 1, in which case do nothing). This maps i 1 in 1. Then consider the set with i 1 removed, find i 2, and transpose it with 2. Continuing in this way, we get the mapping of expression (1) which proves the theorem. The decomposition of a permutation into transposition is not unique as we can always add τ ij τ ij = 1. Although the number of transpositions in 9 ) (1)

10 a decomposition is not unique, this number is always either odd or even for a given permutation. The proof of this important theorem is given as a homework. Thus, ( 1) π σ, where π σ is the number of permutations in an arbitrary decomposition, is always 1 or 1 for a given permutation and we can classify each permutation as either odd or even. We say that each permutation has a definitive parity. One theorem concerning the parity of permutations is that that ( 1) π σ = ( 1) π σ 1, i.e., that a permutation and its inverse have the same parity. This results from the fact that each transposition is its own inverse. B. Determinant The fundamental zeroth-order approximation for the wave function in theory of many fermions is Slater s determinant. Thus, we have to study the concept of determinant. For a general N N matrix A with elements a ij, the determinant is defined as a 11 a a 1N A deta = a 21 a a 2N a N1 a N2... a NN = ( 1) π σ a σ(1)1 a σ(2)2... a σ(n)n (2) σ where the sum is over all permutations of numbers 1 to N and π σ is the parity of the permutation. There are several important theorems involving determinants that we will now prove. First, let us show that that A = A T, which also means that the definition (280) can be written as A = ( 1) π σ a 1σ(1) a 2σ(2)... a Nσ(N). (3) σ To prove this property, first consider σ(i) = 1. There must be one such a iσ(i) in each term in formula (3). Denote this value of i in a given term by i 1 and move a i1 1 to the first position in the product a 1σ(1) a 2σ(2)... a i a Nσ(N) = a i1 1a 2σ(2)... a i1 1σ(i 1 1) a i1 +1σ(i 1 +1) a Nσ(N) Next, look for σ(i) = 2 = σ(i 2 ) and move a i2 2 it the second position in the product. Continuing, one eventually gets a 1σ(1) a 2σ(2)... a σ(n)n = a i1 1a i a ik k... a in N. (4) The set i 1,i 2,...,i N is a permutation σ: σ(k) = i k. Note that σ σ in general. Also, the permutations σ originating from different terms in expansion (3) are all different. This is so since from σ(i k ) = k and i k = σ(k) it follows that σ( σ(k)) = k = (σ σ)(k). Thus, σ = σ 1. Therefore, if we sum all possible terms on the right-hand side of Eq. (4), we sum over 10

11 all permutations of S N (as shown earlier, {σ 1 } = S N ). The only remaining issue is the sign. The sign is right since we have proved that the parity of σ and σ 1 is the same. This completes the proof. The next important theorem says that if one interchanges two columns (or rows) in a determinant, the value of the determinant changes sign A i j = A where A i j denotes a matrix with such interchange. The proof is as follows. We can assume without loss of generality that i < j. Denote: A = {a kl } A i j = { a kl } a kl = a kl if l i,j (5) a ki = a kj, a kj = a ki (6) Therefore, in the expansions of A and A i j, we can identify identical terms, modulo sign. Pick up a term in the expansion of A ( 1) π σ a σ(1)1 a σ(2)2 a σ(i)i a σ(j)j a σ(n)n where σ is here some fixed permutation of 1,2, n. To find the corresponding term in the expansion of A i j A i j = ( 1) π σ a σ σ(1)1 a σ(2)2 a σ(i)i a σ(j)j a σ(n)n we should choose: σ(k) = σ(k) for k i,j since, due to (5), a σ(k)k = a σ(k)k if k i,j. Analogously, σ(i) = σ(j) and σ(j) = σ(i) since, due to (6), a σ(j)j = a σ(j)i = a σ(i)i, where the second equality results from our assumption σ(j) = σ(i), and, similarly, a σ(i)i = a σ(i)j = a σ(j)j. This can be done for all n! terms in A so that there is one to one correspondence between terms, modulo sign. Since σ(k) = { } σ(k) k i,j = ( ) σ τ ij (k) (σ τ ij )(k) k = i or j (if k i,j, τ ij has no effect), the permutations σ and σ differ by one transposition and therefore ( 1) π σ = ( 1) π σ, which proves the theorem. 11

12 Another theorem states that if a column of a matrix is a linear combination of two (or more) column matrices, the determinant of this matrix is equal to the linear combination of determinants, each containing one of these column matrices: A(a j = βb + γc) = β A(a j = b) + γ A(a j = c). (7) The proof follows from the fact that the definition of determinant implies that each term in the expansion (280) contains exactly one element from each column and each row. Thus, each term contains the factor βb i + γc i and can be written as a sum of two terms. Pulling the coefficients in front of determinants proves the theorem. One more theorem which is the subject of a homework is that the determinant of a product of two matrices is the product of determinants: AB = A B. This theorem can be used to prove that the determinant of a unitary matrix, i.e., a matrix with the property UU = I, where the dagger denotes a matrix which is transformed and complex conjugated, is a complex number of modulus 1. Indeed 1 = UU = U U = U ( U T ) = U U = z 2 where we used the theorem about the determinant of a transformed matrix. Finally, a homework problem shows that the determinant of A can be computed using the so-called Laplace s expansion A = ( 1) i+j a ij M ij = ( 1) i+j a ij M ij. i where the matrix M ij is obtained from matrix A by removing the ith row and jth column. j III. SEPARATION OF NUCLEAR AND ELECTRONIC MOTION For a molecule consisting of K particles, nuclei and electrons, the Hamiltonian is H = K i=1 2 2m i 2 R i + K i<j q i q j R i R j (8) where m i (q i ) is the mass (charge) of particle i and all coordinates are measured in a space-fixed coordinate system. This leads to an equation in 3K-dimension. For example, for the hydrogen molecule, it is 12-dimensional. While for small molecules it is currently possible to solve Schödinger s equation with this Hamiltonian, one can easily reduce the dimensionality. First, one can rigorously separate the center of mass motions, reducing the number of dimensions by 3. Second, one can approximately separate the electronic and nuclear motions. For the hydrogen molecule, the resulting equation for electron motions is then six-dimensional. For molecules larger than the hydrogen molecule, 12

13 the gain is not as dramatic since the number of electrons in molecules containing heavier atoms is much larger than the number of nuclei. Nevertheless, this separation is always performed since it easier to solve equations that concern identical particles (i.e., electrons) than several different kinds. The separation of nuclear and electronic motion is a good approximation since a nucleus is at least about 2000 times heavier than an electron and therefore the former particles move about 2000 times slower. Thus, as the slow nuclei move, the fast electrons follow them and their distribution around nuclei is not much different than in the case of stationary nuclei. Such separation of motions is called the adiabatic approximation. In the case of molecules, we more often uses the socalled Born-Oppenheimer (BO) approximation which is a further simplification of the adiabatic one. The BO approximation, called also the clamped-nuclei approximation, just means that electrons move in the field of nuclei clamped in space. The solutions of the clamped-nuclei Schrödinger s equation are called the electronic states. In many cases, one has to go beyond the adiabatic approximation. This is needed for small molecules when one needs to get very accurate results or for any size molecules in certain regions of nuclear configurations where the adiabatic approximation breaks down do to strong interactions between energetically close electronic states. One usually starts from the adiabatic approximation and solves equations that couple the electronic and nuclear motions in a perturbative fashion, computing in this way the so-called nonadiabatic effects. A. Hamiltonian in relative coordinates To simplify notation, let us restrict our attention to diatomic molecules with nuclear masses M 1 and M 2. A generalization to molecules with more nuclei is straightforward. Let R i, i = 1,2, denote the coordinate of the two nuclei, whereas the coordinates of the N electrons will be denoted by r i, all coordinates still in a space-fixed system. Now introduce the center of mass (CM) R CM = 1 N M M 1R 1 + M 2 R 2 + m e r i, where m e is the mass of an electron and M = M 1 +M 2 +Nm e the total mass of the molecule, and relative coordinates i=1 R = R 1 R 2 r i = r i 1 2 (R 1 + R 2 ). We have chosen to measure electronic positions from the geometric center of nuclei. Another possible choice is to measure them from the center of nuclear mass. To transform the Hamiltonian (8), we have to perform some chain-rule differentiations 13

14 corresponding to the following change of variables: [R 1,R 2, r 1,..., r N ] [R CM,R,r 1,...,r N ] X 1 = X CM + X CM X 1 X Now second derivatives 2 X X 2 2 = = ( ) 2 M1 2 M XCM X = M 2 X 2 M X + X 1 i x i = M 1 x i X 1 M X CM X 1 2 i = m e +. x i M X CM x i x i ( ) 2 M2 2 M XCM X i 2 ( me x i 2 = M i x i 2 2 ) M 1 M 2 M 2 M X CM X CM + X CM X 1 2 i x i, X M 1 M X M 2 M XCM xi m e. M X CM x i Plug the derivatives in the kinetic energy part of the Hamiltonian T x = M 1 2 M 2 XCM M 1 X 2 2 8M 1 M 2 2 M 2 XCM M 2 X 2 2 8M 2 Nm e 2 M 2 XCM 2 2 2m e i 2 x i i + 2 2M 1 X x i i 2 x i i 2 2M 2 X x i 2 x 2 i 2 M i X CM i 2 M + 2 M x i, X CM X CM X CM i X CM i x i, x i X i + x i X i X + 2 2M X CM x i X + 2 2M X CM x i Terms 4 and 10 cancel, so do terms 5, 11, and 15. Terms 1, 7, and 13 can be added together and the masses in the numerators add to M. We therefore now get T x = 2 2M 2 XCM 2 2 2µ 2 X 2 2 2m e i 2 x 2 i 2 8µ x i i ( 1 M 1 1 M 2 i i x i, x i, ), X x i i where 1/µ = 1/M 1 + 1/M 2. Since the CM coordinates appear only in the first term, the center of mass motion can be separated. After adding the terms in the other two 14

15 directions, the remaining Hamiltonian, expressed only in relative coordinates, can be written as H = 2 2µ 2 R 2 2m e i 2 r i 2 8µ 2 ri i µ a R ri + V, where we denoted 1 M 2 1 M 1 = 1 µ a and V denotes the second term in the Hamiltonian (8). Since this term contains only interparticle distances, it is uneffected by the transformation. i B. Born-Oppenheimer approximation The Hamiltonian can be divided into two parts H = H 0 + H (9) H 0 = 2 2 r 2m i + V (10) e i 2 H = 2 2µ 2 R 2 8µ ri µ R ri. (11) a i The Hamiltonian H 0 is called the electronic Hamiltonian since it acts only on electronic coordinates. It is also called clamped-nuclei Hamiltonian since it describes the system if H is neglected (and H becomes zero if nuclear masses go to infinity so that nuclei do not move, are clamped in space). Such approach is called the BO approximation. The electronic Schrödinger equation is 2 2 r 2m i + V e ψ(r 1,...,r N ;R) = E(R)ψ(r 1,...,r N ;R). i Since the equation is different for each internuclear separation R = R, the wave function and the energy depend parametrically on R. We use the word parametrically" to emphasize that R is not a variable in the electronic Schrödinger equation, but the equation has to be solved separately for each value of R that is of interest. For molecules with more than two nuclei, the electronic wave function depends parametrically on the positions of all nuclei. Despite the name clamped-nuclei" approximation, one solves for nuclear motion in the BO approximation. To do this, one assumes the exact wave function to be a product of the electronic wave function and of a function of R Ψ (r 1,...,r N ;R) ψ(r 1,...,r N ;R)f (R). 15 i

16 Next, approximate H by its first term and plug this function into the approximate Schrödinger s equation 2 2µ 2 R 2 2 r 2m i + V e ψ(r 1,...,r N ;R)f (R) = Eψ(r 1,...,r N ;R)f (R). i The function f can be pulled out from the second and third term of the Hamiltonian. We make now one more approximation and neglect the terms resulting from the action of the first term on ψ. Then we can write 2 2µ ψ(r 1,...,r N,R) 2 Rf (R)+f (R) 2 2 r 2m i + V e ψ(r 1,...,r N,R) = Eψ(r 1,...,r N ;R)f (R). and integrate over electron coordinates assuming ψ ψ = 1 for all R. We then get [ ] 2 2µ 2 R + E(R) f (R) = Ef (R). i Thus, the electronic energy becomes the potential energy surface for the motion of nuclei. C. Adiabatic approximation and nonadiabatic correction The BO approximation discussed above can be obtained from a more rigorous procedure that originates from the exact solutions of the Schrödinger equation for all particles. We can expand such a solution in complete basis sets in electronic and nuclear coordinates using the theorem discussed earlier Ψ (r 1,...,r N,R) = c ij ψ i (r 1,...,r N )g j (R) = ψ i (r 1,...,r N ) c ij g j (R) ij = i ψ i (r 1,...,r N ) f i (R) where the second, equivalent form is more convenient to use. However, since we want to use the solutions of the electronic Schrödinger equation rather than some arbitrary complete basis set, our expansion becomes Ψ (r 1,...,r N,R) = ψ j (r 1,...,r N ;R)f j (R). (12) j One can view this expression as using a different complete basis set for each R. We now insert the expansion (12) into Schrödinger s equation (with CM separated), multiply by ψ i (r 1,...,r N ;R), and integrate over electronic coordinates. Let s work out the first term in the operator H : 2 2µ ψ i ( 2 R ψ ) 2 j f j = j 2µ j i [ ψi ( 2 R ψ ) j fj + ψ i ψ j 2 R f j + 2 ψ i ( ) ] R ψ j R f j, j (13) 16

17 where the parentheses inside integrals indicate that differentiations with respect to R are performed only inside the parentheses. Similarly, for the third term we get ψ 2µ i a R rk ψ j f j = (14) 2 j k 2µ a ψ i R rk ψ j f j + ψ i rk ψ j R f j. 2 j The sum of the first term in Eq. (13), of the matrix element of second operator in Eq. (11), and of the first term in Eq. (14) can be written as 2 2µ ψ i ( 2 R ψ j j ) fj 2 8µ k k (15) 2 ψ i rk ψ j f j 2 ψ 2µ i j k a R rk ψ j f j j k = H ij f j (16) where H ij are the matrix elements of H between the electronic wave functions with H interpreted in such a way that it does not act outside the integrals. i.e., H ij are simple functions of R. With this definition, we can write Schrödinger s equation as 2 2µ 2 R f i(r) + E i (R)f i (R) Ef i (R) + 2 µ j H ij (R)f j(r) ψ i ( ) R ψ j R f j (R) 2 ψ 2µ i rk ψ j R f j (R) = 0 (17) a j where we used the orthonormality of electronic wave functions for each R to obtain the first three terms. The last two terms will be written as B ij (R) R f j (R) = 2 1 µ ψ i ( ) 1 R ψ j + ψ 2µ i rk ψ j a Rf j (R). We will now show that B ii (R) = 0 for real electronic functions (one can always choose electronic functions to be real, for proof see Shankar p. 177). This is because we have 0 = R ψ i ψ i = R ψ i ψ i + ψ i R ψ i = 2 ψ i R ψ i so that the first term is zero. The second term is zero since it is proportional to the expectation value of the momentum operator. The latter value is zero since for real wave function the probability of finding momentum P and P is the same (in one dimension 17 j k k j

18 e ipx/ ψ 2 = e ipx/ ψ 2 and this result generalizes to any number of dimensions). Now we can move the off-diagonal to the right-hand side, getting [ ] 2 2µ 2 R + E i(r) + H ii (R) E [ f i (R) = H ij f j(r) + B ij (R) R f j (R) ] (18) Note that this equation is still equivalent to Schrödinger s equation. This set of coupled equations can be solved directly for very small molecules, but usually one solves it perturbatively, treating the right-hand side as a perturbation. The last form of Schrödinger s equation is appropriate for making the approximations discussed above. Since usually the off-diagonal matrix elements are smaller than diagonal one, one obvious approximation is to neglect the right-hand side. This gives the adiabatic approximation. The resulting equation for f i (R) differs from the BO equation by the term H ii (R) which is called the adiabatic or diagonal correction. Thus, the BO approximation differes from the adiabatic approximation by this correction. The adiabatic equation is of the same degree of difficulty as the BO equation since in each case nuclei move on a potential energy surface. Since the diagonal correction is usually small, in most current calculations it is neglected. The adiabatic approximation fails when potential two energy surfaces E i (R) and E i (R) become close to each other. Clearly, in such cases some off-diagonal matrix elements are not significantly smaller compared to diagonal ones since there are two electronic wave functions which are similar. In such cases, one has to include at last the off-diagonal matrix elements that couple these states. j i IV. THE INDEPENDENT-PARTICLE MODEL: THE HARTREE-FOCK METHOD Our problem to solve it the time-independent Schrödinger equation with the Hamiltonian Ĥ = 2 2m N i=1 N nuc 2 i a=1 N i=1 Z a e 2 r i R a N i<j e 2 r i r j (19) where m denotes electron s mass, e electron s charge, N is the number of electrons, N nuc is the number of nuclei, Z a is the charge of nucleus a, r i are positions of electrons, R a are positions of nuclei. Note that this Hamiltonian is the same as the Hamiltonian defined by Eq. (10) except that we neglected the nuclear-nuclear repulsion terms. These terms give just a constant in any type of electronic structure approach and this constant can be simply added to the final result. We also dropped the subscript 0" since this will be the only Hamiltonian considered from now on. Despite the simplification of eliminating nuclear degrees of freedom, the solution of the clamped-nuclei Schrödinger s equation for even simple molecules, such as the 18

19 water molecule with 10 electrons and 30 spatial degrees of freedom, appears as an impossible task. The main idea for simplifications that may come to mind is to solve such equation one electron at a time, which is then a 3-dimensional problem. In the most straightforward approach, this would mean that one neglects all interactions between electrons in the Hamiltonian (19). With such an approximation, the problem rigorously separates into N one-electron problem when the wave function is written as a product of one-electron functions. However, this straightforward independent-particle model works poorly. In particular, when an electron in a molecule or solid is far from a nucleus, it does not see an object of charge Z a since other electrons screen the nuclear charge. There were several efforts at the beginnings of quantum mechanics to scale nuclear charges to account for the screening. One step further is to include in the one-electron equation an interaction with an electron cloud representing an average of the electron positions, leading to a family of mean-field methods. It turns out there is a rigorous and systematic way of achieving the best possible representation of the mean field, called the Hartree-Fock (HF) method. The wave function in this method is an antisymmetrized product of one-electron functions and the method still requires solving only one-electron equations, however, the set of equations is coupled. A. Slater determinant and antisymmetrizer The wave function in the HF method is written in the form of Slater determinant Ψ (x 1,x 2,...,x N ) = 1 φ k1 (x 1 ) φ k1 (x 2 )... φ k1 (x N ) φ k2 (x 1 ) φ k2 (x 2 )... φ k2 (x N ) N! (20)... φ kn (x 1 ) φ kn (x 2 )... φ kn (x N ) where x i = {r i,s i } denotes the spatial and spin coordinates of ith electron and singleelectron functions φ ki (x j ) are called spinorbitals. The spin variable takes on the values ± 2 1. We will use only pure-spin spinorbitals which means that a given φ i(x) has to be zero at s = 1 2 and nonzero at s = 2 1 or vice verse. The orbitals form a complete set of one-electron functions and the subset included in the Slater determinant is an arbitrary subset of such spinorbitals. Of course, as it follows from properties of determinants, all spinorbitals have to be different, otherwise the determinant is zero. As we will show soon, the normalization factors assures that the determinant is normalized to 1 if the set of spinorbitals is orthonormal. The determinant can also be written as the result of the action of an operator A called antisymmetrizer A = 1 ( 1) π σ P N! σ (21) σ 19

20 where the sum is over all N! permutations P σ of N electrons. Since P σ acts now on electron coordinates, we call it an operator. The normalization factor assures that the antisymmetrizer is idempotent, i.e., A 2 = A. This can be seen from the following A 2 = 1 (N!) 2 σ,σ ( 1) π σ ( 1) π σ P σ P σ. Consider a fixed value of σ. The product of P σ with all operators P σ is equal to the set of all S N operators. Thus, as we sum over σ, we get N! times the set of all permutation operators. Thus, it will be equal to A if the signs are right. This is the case since if we write P σ = P σ P σ, and expand P σ and P σ into products of transpositions, we see that the number of transpositions in P σ is the sum of the numbers of transpositions in P σ and P σ. Acting with A on the product of spinorbitals, we get Ψ (x 1,x 2,...,x N ) = N!A ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ) we indeed get the Slater determinant since the antisymmetrizer realizes the definition of the determinant. Let us now prove that the Slater determinant is normalized if the spinorbitals are normalized: φ i φ j = φi (x)φ j(x)d 3 r = δ ij s where we defined the bracket notation that will be used for integrals from now on. Notice that the bracket includes summation over the spins. This summation runs over s = 2 1 and s = 1 2. If i = j, the spinorbital is nonzero at one of the values of s. Two spinorbitals may have the same spatial part, but differ by spin. Then for each value of s one of the spinorbitals is zero, which satisfies orthonormality. Other pairs of different spinorbitals can be orthogonal already due to different spins or/and due to orthogonality of spatial components (usually one assumes, however, that different spatial parts are always orthogonal). The overlap integral can be written as Ψ Ψ = N! A ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ) A ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ) where the brackets denote integral over space and spin coordinates of all electron. Since A is obviously Hermitian and we have shown that it is idempotent, we can move it to the ket getting Ψ Ψ = N! ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ) A ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ). Consider first the identity permutation in A. For this term, the integral separates into the product of N one-electron integrals with the integrand in each one-electron integral 20

21 being the square modulus of a spinorbital. Thus, each such integral is 1. Now consider the term such that electron 1 is permuted with 2: ( φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ) ( φ k1 (x 2 )φ k2 (x 1 )... φ kn (x N ) ). Now, we get two integrals that are zero: φ k1 (x 1 ) φ k2 (x 1 ) and φ k2 (x 2 ) φ k1 (x 2 ). Thus, this contribution is zero. Clearly, any permutation of electrons in the ket leads to zero term. Thus, A reduces to N! 1 I and Ψ Ψ = 1. B. Slater-Condon rules Matrix elements of the Hamiltonian (19) with Slater determinants can be written in terms of matrix elements between spinorbitals using the so-called Slater-Condon rules. Let us define the operators h(i) = 2 N nuc 2m 2 i Z a e 2 r a=1 i R a (22) N ˆF = h(i) (23) Ĝ = 1 2 i=1 N i<j e 2 r i r j = 1 2 N g(ij) (24) where we introduced short-hand notation replacing x i by i. The rules for the operator ˆF are Ψ ˆFΨ = i<j N h ii (25) i=1 Ψ ˆFΨ = h ik (26) Ψ ˆFΨ = 0 (27) where Ψ denotes the determinant built from the set of spinobitals φ 1, φ 2,... φ N, Ψ differs from Ψ by replacement of the spinorbital φ i by the spinobital φ k, k > N, and Ψ includes two such replacements. We have also introduced a short-hand notation for the integrals, i.e., h ik = φ i hφ k. The rules are actually valid for any set of orbitals, but we will need them only for the set specified. The proof of Eq. (25) is as follows. Similarly as we did when proving the normalization of Slater determinants, we can move the antisymmetrizer, only now from ket to bra. We have to first commute it with the operator ˆF. This is possible since this operator is symmetric, i.e., does not change if we permute any electrons in it. Thus, we can pull ˆF 21

22 through the antisymmetrizer. Then using the Hermiticity and idempotency of A, we get Ψ ˆFΨ = N! A(φ 1 (1)φ 2 (2)... φ N (N)) h(i)(φ 1 (1)φ 2 (2)... φ N (N)) = N! A(φ 1 (1)φ 2 (2)... φ N (N)) [h(1)φ 1 (1)φ 2 (2)... φ N (N) i + φ 1 (1)h(2)φ 2 (2)... φ N (N) +φ 1 (1)φ 2 (2)... h(n)φ N (N)] We can see that similarly as in proof of normalization, any permutation of the electron in the bra will lead to a zero integral. Thus, A reduces to N! 1 I which proves Eq. (25). In the case of Eq. (26), we will have in the ket one spinorbital, φ k, which is orthogonal to all spinorbitals in the bra. Thus, the integral involving this spinorbital can be nonzero only if h acts on it. Moreover, in the bra one has to permute the electrons in such a way that spinorbital φ i is the function of the same electron as φ k since φ i is orthogonal to all spinorbital in the bra. Thus, only a single permutation in the bra survives, which proves Eq. (26). In the case of Eq. (27), we will have in the ket two spinorbitals, φ k and φ k, which are absent in bra. Only one of them can be acted upon by h, so that the other spinorbital will always make the integrals zero, which proves Eq. (27). The analogous formulas involving Ĝ are (proofs in a homework): Ψ ĜΨ = 1 2 Ψ ĜΨ = N ( ) gijij g ijji i,j=1 N ( ) gijkj g ijjk j=1 (28) (29) Ψ ĜΨ = g ijkl g ijlk (30) Ψ ĜΨ = 0 (31) where Ψ denotes a triply substituted Slater determinant and g ijkl = φ i φ j gφ i φ j = d 3 r 1 d 3 r 2 φi (x 1)φj (x e 2 2) r 1 r 2 φ k(x 1 )φ l (x 2 ). s 1,s 2 C. Derivation of Hartree-Fock equations The Hartree-Fock methods seeks the Slater determinant Ψ which minimizes the expectation value of the Hamiltonian E HF = min Ψ Ψ HΨ Ψ Ψ E 0. 22

23 The Ritz variational principle guarantees that E HF is always greater or equal to the exact ground-state energy E 0 of a given system. Since Ψ is built from spinorbitals, the method finds the optimal spinorbitals for the ground state of a system and can be considered to be the ultimate mean-field method. We will always assume that the spinorbitals are orthonormal, so that Ψ is normalized and we can write E HF = min Ψ HΨ. Ψ Using the Slater rules for the Hamiltonian (19), the expectation value can be written as N Ψ HΨ = h ii + 1 N ( ) gijij g 2 ijji i=1 where we still assume that the set of spinorbitals is enumerated by 1,...,N and we included i = j term in the second sum since the two terms add to zero in this case. To find the minimum, we have to vary each orbital. Since the orbitals are complex, one has to vary both the real part and the imaginary part. Equivalently, one can vary the orbital and its complex conjugate i,j=1 φ i φ i + δφ i φ i φ i + δφ i. We will start from varying ψi s only and we will find that this is sufficient to obtain a solvable set of equations. One can then check that varying ψ i s gives an equivalent set of equations. Since we assumed that the spinorbitals are orthonormal, we have to imposed the condition ψ i ψ j = δ ij during the optimization. This can be done by adding to the expectation value the the condition multiplied by Lagrange s undetermined multipliers. Thus, we will minimize L = Ψ HΨ N i j=1 λ ij ( φi φ j δ ij ). Replacing all φi by φ i + δφ i, we get L[{φi + δφ i }] = φ i + δφ i ĥφ i + 1 (φ 2 i + δφ i ) ( ) ( ) φ j + δφ j ĝ φi φ j φ j φ i i ij ( ) λ ij φi + δφ i φ j δ ij i j = φ i ĥφ i + 1 φ 2 i φ j ĝ ( ) φ i φ j φ j φ i i ij + δφ i ĥφ i + 1 δφ 2 i φ j ĝ ( ) 1 φ i φ j φ j φ i + φ 2 i δφ j ĝ ( ) φ i φ j φ j φ i i ij ij λ ij δφ i φ j i j 23

24 where the first two terms give the value of the functional at the minimum and we have used the fact that the spinorbitals are orthonormal at the minimum so there is no orthonormality term in this part. We have also omitted the terms that are products of orbital increments as they are of second order. The term with δφ j can be shown to be equal to the preceding term 1 2 ij φ i δφ j ĝ ( φ i φ j φ j φ i ) = 1 2 ij δφ j φ i ĝ ( φ j φ i φ i φ j ) = 1 2 δφ i φ j ĝ ( ) φ i φ j φ j φ i where in the first step with interchanged coordinates of electron 1 and 2 in the integral and in the second step we interchanged the summation indices. We can now write L[{φi + δφ i }] = EHF [{φi }] + d 3 r 1 δφi (x 1) j s 2 i s 1 ĥ(r 1)φ i (x 1 ) + d 3 r 2 φj (x 2)ĝ(r 1,r 2 ) ( φ i (x 1 )φ j (x 2 ) φ j (x 1 )φ i (x 2 ) ) λ ij φ j (x 1 ) Since at the minimum the linear increment has to be zero for an arbitrary δφi, this can be only achieve if the whole expression in the large square bracket is equal to zero for any ij j x 1 ĥ(r 1 )φ i (x 1 ) + j s 2 d 3 r 2 φj (x 2)ĝ(r 1,r 2 ) ( φ i (x 1 )φ j (x 2 ) φ j (x 1 )φ i (x 2 ) ) = λ ij φ j (x 1 ). j These are the Hartree-Fock equations for spinorbitals. Let us rewrite this equation introducing the so-called Coulomb and exchange operators Ĵ(r 1 ) = Ĵ j (r 1 ) = d 3 r 2 φj (x 2)ĝ(r 1,r 2 )φ j (x 2 ) j j j s 2 ˆK(r 1 )φ(x 1 ) = ˆK j (r 1 )φ(x 1 ) = d 3 r 2 φj (x 2)ĝ(r 1,r 2 )φ(x 2 )φ j (x 1 ) j s 2 where φ(x 1 ) is an arbitrary spinorbital. Note that while Ĵ is just a multiplicative operator, ˆK is an integral one since it integrates over the function it acts upon. Notice that the operators Ĵ and ˆK do not depend on spin but act on functions including spin coordinate. Using these operators, one can rewrite the HF equations as [ĥ(r1 ) + Ĵ(r 1 ) + ˆK(r 1 ) ] φ i (x 1 ) = λ ij φ j (x 1 ). (32) Equations (32) are the set of N equations for N spinorbitals φ i depending also on N(N + 1)/2 Lagrange s multipliers. It is possible to transform these equations to the socalled canonical form where only the diagonal multipliers are present. To achieve this 24 j

25 goal, we will use the important theorem stating that Slater determinants are invariant under unitary transformations of spinorbitals (i.e., the determinant is the same when expressed in the original and in the transformed spinorbitals). Let us denote the transformed set of spinorbitals by φ i, so that φ 1 φ 1 = φ 2... φ N φ = 2... φ N = U with UU = I. The Slater determinant built of transformed spinorbitals can be written as Ψ = φ 1 (x 1 )φ 2 (x 2)...φ N (x N ) i u 1iφ i (1) i u 1iφ i (2)... i u 1iφ i (N) = i u 2iφ i (1) i u 2iφ i (2)... i u 2iφ i (N) i u Niφ i (1) i u Niφ i (2)... i u 2iφ i (N) φ 1 (1) φ 1 (2)... φ 1 (N) = φ U 2 (1) φ 2 (2)... φ 2 (N) = U Ψ φ N (1) φ N (2)... φ N (N) where... denote a determinant and [...] a matrix and where we started to use shorthand notation x i i. Since the determinant of a unitary matrix is a complex number of modulus 1, U is just a multiplicative phase factor which is irrelevant. If we apply the unitary transformation to the HF equations, i.e., replace all spinobitals by transformed spinorbitals, this substitution has to be made also in the Ĵ and ˆK operators. Let us now show that these operators are invariant under such transformation. For Ĵ we have Ĵ[{φ i }] = d 3 r 2 (φ j ) (2)ĝ(r 1,r 2 )φ j (2) j j s 2 = d 3 r 2 ĝ(r 1,r 2 ) uji φ i (2) u jk φ k (2) j s 2 = d 3 r 2 ĝ(r 1,r 2 ) u ij φi (2) u jk φ k (2) s 2 s 2 = d 3 r 2 ĝ(r 1,r 2 ) φi (2)φ k(2) uji u jk. i,k i i j k k 25

26 The last sum is the same as in U U = I, so that Ĵ[{φ i }] = d 3 r 2 ĝ(r 1,r 2 ) φi (2)φ k(2)δ ik s 2 i s 2 i,k = d 3 r 2 ĝ(r 1,r 2 )φi (2)φ i(2) = Ĵ[{φ i }], i.e., the Coulomb operator is indeed invariant under the unitary transformation of orbitals. An analogous proof holds for ˆK. Let us write HF equations in matrix form [ĥ(r) + Ĵ(r) + ˆK(r) ] = and rewrite them in transformed form [ĥ(r) + Ĵ(r) + ˆK(r) ] U = U and multiply this equation from left by U to get [ĥ(r) + Ĵ(r) + ˆK(r) ] UU = UU. Since matrix is symmetric, we can always find a unitary transformation that diagonalizes it, getting [ĥ(r) + Ĵ(r) + ˆK(r) ] = diag Denoting the diagonal elements of diag by ɛ i, we can write the canonical HF equations as [ĥ(r) + Ĵ(r) + ˆK(r) ] φ i (x) = ɛ i φ i (x) i = 1,2,...,N. These equations are sometimes called pseudo-eigenvalue equations since the operators Ĵ and ˆK depend on all φ i s. The equations have to be solved iteratively, i.e., one first assumes some initial orbitals (for example, for atoms these can be the hydrogenic orbitals), solves the resulting eigenvalue problem, computes new operators Ĵ and ˆK using the spinorbitals just obtained, and so on. The convergence of iteration can sometimes be a problem and several methods have been developed to deal with such problems. The most often used method of solving HF problems is to expand the spinorbitals in terms of some known basis functions χ i φ i = c ij χ j. j Then Hartree-Fock equations become matrix pseudo-eigenvalue equations, sometimes called Hartree-Fock-Roothaan or Hartree-Fock-Roothaan-Hall equations. The basis functions can be in particular atomic orbitals and such approach is then called the linear combination of atomic orbitals (LCAO) method. This approach can be used not only to 26

27 compute energies for atoms, molecules, and solids, but also as an interpretative tool. In particular, it can be used to interpret the chemical bond in molecules. In the simplest case of a diatomic molecules if one restricts restricts linear combinations to pairs of orbitals with similar energy, one of the two molecular orbital energies resulting from such a pair is lower than either atomic orbital energy whereas the other one is higher (this fact is not obvious). Thus, if the spinorbitals corresponding to the lower level are occupied (such spinorbitals are called the bonding ones), there is a gain in energy. This picture can be also applied to solids where the orbital energies corresponding to different pairs of atoms will be very close to each other, forming the so-called bands. The relations between the highest filled or partly filled band and the lowest empty band determine whether a solid is an insulator, semiconductor, or conductor. However, for conductors the HF method encounters several problems. V. SECOND-QUANTIZATION FORMALISM In many-particle theory, one often uses the so-called formalism of second quantization, i.e., we express formulas in terms of operators that create or annihilate particles at some spinorbitals. The name of the formalism is somewhat misleading. It results from the first use of such operators to quantize electromagnetic field. This happened in the late 1920s, after the formulation of Schrödinger s equation which was the first quantization, i.e., the quantization of particles. Since we will consider only particles, this leads to a kind of oxymoron: we will use second-quantization tools in first quantization approach. A. Annihilation and creation operators Slater determinants can be described using occupation number representation: Ψ = N!A [ φ k1 (x 1 )φ k2 (x 2 )... φ kn (x N ) ] n 1, n 2,..., n i,... { ni = 1 if i {k 1,k 2,...,k N } n i = 0 otherwise where we assumed that spinorbitals form an ordered set. Conventionally, we order spinorbitals coming from the HF method according to their orbital energies (for degenerate spinorbitals, the order is arbitrary). To have a unique relation to the standard way of writing the Slater determinant, we assume that that the spinorbitals on the left-hand side are also ordered. The number of positions in the occupation number representation is infinite, but we can display explicitly only the sequence up to the highest occupied orbital. For example, Ψ = 5!A[φ 2 (x 1 )φ 7 (x 2 )φ 9 (x 3 )φ 10 (x 4 )φ 11 (x 5 )]

28 The creation operator can now be defined a i Ψ = a i... 0 i... = ( 1) σ... 1 i... a i... 1 i... = 0 where σ denotes the number of 1 s before the position i. In words, if spinorbital φ i is absent in Ψ, this orbital is added at the ith place. If this orbital is present, the action of a i gives zero. The phase factor ( 1) σ is needed in order to create exactly the same determinant as would be built from the set of the spinorbitals present in Ψ plus the spinorbital φ i. To see that the phase is correct, first add φ i as the first row to Ψ. This gives us a uniquely defined determinant, but it may differ by phase from the standard determinant. We can then permute this row with other rows until it arrives at ith position. This gives the phase factor ( 1) σ. We can use the creation operators to define the ground state determinant of a system with N electrons: Φ 0 = a 1 a 2... a N vac. A homework exercise will show the order chosen gives the correct ground-state determinant. In the case of a closed-shell system (e.g., atoms with complete shells occupied), this definition gives a unique determinant. For open-shell systems where the highest occupied spinorbital is degenerate, several determinants can be created. In most cases, a linear combination of such determinants is required to form a wave function with proper symmetry properties. The annihilation operator is defined analogously to the creation operator a a i Ψ = i... 1 i... = ( 1) σ... 0 i... a i... 0 i,... = 0 so the we may consider this action as first permuting φ i until it arrives at the first row and then annihilating it. B. Products and commutators of operators If several creation operators act in sequence, the order is important since these operators do not commute. Let us find the commutation rule by acting with a pair of creation operators on a state with zero electrons. We call such a state true vacuum and denote by vac. We get (assume without loss of generality that i < j): a i a j vac = a i... j 1 =... i 1... j

29 a j a i vac = a i... i 1... =... i 1... j 1... Thus, since a i a i = 0 from the definition, we have a i a j = [ ] a j a i or a i,a j = 0, + where [a,b] + = ab +ba, is the anticommutator. Thus, the creation operators anticommute. We have considered here the action of a i a j on the vacuum state only, but one can easily see that the result is the same for any determinant since if either φ i or φ j is included in the determinant, we get zero. In the opposite case, reasoning is the same as for the vacuum case. Clearly, the commutation rule for annihilation operators is analogous to that for the creation operators [ ] ai,a j = 0. + To show this, we have to act on a determinant containing both φ i and φ j and the minus sign results analogously to the creation operators case. Let us now find our commutation rules for products of creation and annihilation operators a i a i Ψ = { 0 if i Ψ Ψ if i Ψ. Note the phase factors cancel: ( 1) σ ( 1) σ = 1. Analogously, a i a i Ψ = { 0 if i Ψ Ψ if i Ψ. Thus, in the anticommutator, one of the two terms will always reproduce Ψ, so that [ a i,a i ] + = 1. Finally, consider (again assuming i < j) a i a j Ψ = 0 if j Ψ or i ψ ( 1) σ j ( 1) σ i... 1 i... 0 j... otherwise. a j a i Ψ = 0 if j Ψ or i ψ ( 1) σ j+1 ( 1) σ i... i 1... j 0... otherwise where the additional power of 1 results from the fact that the number of occupied states increases by one due to the action of a i. Thus, the action in the opposite orders gives the same result times 1, so that we have [ ] a i,a j = δ + ij. 29

30 C. Hamiltonian and number operator Since linear combinations of creation and annihilation operators can be used to construct an arbitrary wave function in the Hilbert space, one can use such linear combination to construct various operators. Let us first construct a simple operator called the occupation number operator ˆN i = a i a i As shown above, this operator acting on any determinant gives 0 if spinorbital i is absent and recovers the determinant when it is present. We may say therefore that the eigenvalues of this operator are 0 and 1 and these eigenvalues are occupation numbers for φ i in Ψ ˆN i Ψ = n i Ψ. We can now construct the number operator ˆN = i=1 ˆN i which gives ˆN Ψ = n i Ψ = N Ψ. Thus, this operator detects" the number of electrons in Ψ. Let us now consider the one-electron part of the Hamiltonian i=1 ˆF = N f ˆ(r i ). i=1 We postulate that ˆF can be written as ˆF = f ij a i a j i,j=1 where, as before, f ij = φ i ˆ f φ j. To prove this expression, it is sufficient to show that the matrix elements of this Hamiltonian with arbitrary determinants are the same as those resulting from the Slater-Condon rules Ψ ˆFΨ = f ij Ψ a i a jψ = f ij δ ij = i,j=1 We could reduce the sum to go only over indices of the orbitals present in Ψ since if j is not in the range, then the action of a j gives zero, whereas if i is not in the range, 30 i,j Ψ i Ψ f ii

31 the determinant created in the ket is orthogonal to the one in the bra. For indices in the range, the action of a i a j with i j, as discussed earlier, gives either zero or a determinant different from the original one which makes the matrix element equal to zero. Thus, the only case when the matrix element is nonzero is i = j and then a i a i is just the occupation number operator. This proves the theorem for the same determinant on both sides. Now consider the case when spinorbital k in the ket is replaced by spinorbital l. All the arguments from the previous case still hold except that in addition i has to be equal to k to annihilate φ k, otherwise we get zero. Thus, the next to last sum is only over j Ψ ˆFΨ = f ij Ψ a i a jψ = i,j=1 i,j Ψ,Ψ f ij Ψ a i a jψ = f kl. In this case, j has to be equal to k to annihilate the replacement spinorbital and i has to be equal to k to create φ k in Ψ, which gives just a single matrix element. The overall sign is plus since the spinorbitals are annihilated and created at the same position. For the Ψ case, a i a j is unable to annihilate two replacement orbitals, so the result is zero. Thus, the second-quantized form of ˆF gives the same matrix elements as the first-quantized form and therefore the two forms are equivalent. The proof of a similar expression for the operator Ĝ is left as a homework problem Ĝ = 1 2 i,j,k,l=1 g ijkl a i a j a la k. Notice that the order of indices is different in the matrix element from that in the string of operators. D. Normal products and Wick s theorem 1. Normal-Product The normal-product of creation and annhilation operators is defined as rearragned product of these operators such that all creation operators are to the left of all annihilation operators with a phase factor corresponding to the parity of the permutation producing the rearrangement. For an arbitrary product of creation and annihilation operators ABC... normal-product is denoted as n[abc..] and is given as where n[abc...] = ( 1) σ a b...uv... a b...uv... = ˆP (ABC...) ˆP being the permutation of operators A,B,C,... and σ being the parity of permutation. This definition is not unique, since any rearrangement of the creation operators among 31

32 themselves and/or the annihilation operators among themselves is permissible but would always be accompanied by an appropriate change in the phase factor; thus all forms of a normal-product are equivalent. Examples are as follows: n[a b] = a b, n[ab ] = b a, n[ab] = ab = ba, n[a b ] = a b = b a n[a bc d] = a c bd = a c db = c a bd = c a db The usefulness of the normal-product form is that its physical vacuum expectation value is zero: vac n[abc...] vac = 0 if [ABC...] is not empty. 2. Contractions (Pairings) In order to be able to compute expectation values of general operator strings, we will take advantage of Wick s theorem. In order to be able to formulate this we need to define the contraction (or pairing) of operators. For a pair of creation or annihilation operators A,B, we define their contraction as Specifically, the four possibilities are: A B AB n[ab] a b = a b a b = 0, a b = ab ab = 0, a b = a b a b = 0, a b = ab ( b a) = [a,b ] + = δ ab. A normal-product with contractions is defined as follows: n[abc...r...s...t...v...] = ( 1) σ RT SV...n[ABC...] where all the contracted pairs have been put in front of the normal-product and σ is the parity of the permutation. 3. Time-independent Wick s theorem A product of a string of creation and annihilation operators is equal to their normalproduct plus the sum of all possible normal-products with contractions. Symbolically, ABCD... = n[abcd...]+n[abcd...]+n[abcd...]+n[abcd...]+...+n[abcd...]+n[abcd...] 32

33 n[abcd...] + n[abcd...] + n[abcd...] + n[abcd...] +... Thus, all possible contractions of one pair, two pairs etc. are included. The importance of the above result is that the vacuum expectation value of any normal-product with contractions is zero unless all operators are contracted. The reason is that each contraction contributes a factor of zero or 1 and, if an uncontracted normal-product remains, its vacuum expectation value is zero. For example, consider a bc de f, applying Wick s theorem we get a bc de f = n[a bc de f ] + n[a bc de f ] + n[a bc de f ] + n[a bc de f ] + n[a bc de f ] where we have omitted all contractions except those of the form ab, since they vanish. Since no fully contracted term survives, the vacuum expectation value of this operator product is zero. A more complex example ab cd ef is given as a homework. 4. Outline of proof of Wick s theorem In a normal-ordered product p q...uv all contractions vanish since in such a product there can be no contractions involving annihilation operator to the left of creation operator. Thus, if a string of operators is already in normal-product form we have p q...uv = n[p q...uv] + (All possible contractions) since all terms in the sum vanish. Thus Wick s theorem holds in this case. Consider next the case where one pair of operators is out of normal order: p q...rs...uv = p q... ( [r,s ] + s r )... uv = p q...δ rs...uv p q...s r... uv = n[p q...rs...uv] + n[p q...rs...uv] All other contractions vanish, so Wick s theorem still holds. Now consider the case where we have two annihilation operators to the left of one of the creation operators: p q...rst...uv = p q...rst...uv p q...rt s... uv = p q...rst...uv p q...rt s...uv + p q...t rs... uv = n[p q...rst...uv] + n[p q...rst...uv] + n[p q...rst...uv] again satisfying Wick s theorem, since all other contractions vanish. This procedure can be continued for all pairs of operators out of normal order. 33

34 5. Comprehensive proof of Wick s theorem We shall prove this theorem in three steps. We first prove a lemma L1 which expresses an arbitrary normal-product, multiplied on the right by a single operator, in terms of the normal-product and normal-products with pairings of all the operators involved. Next, we shall generalize this lemma (L2) for normal-products with contraction and, finally, we shall use the two lemmas to prove the general theorem. L1: k n[m 1 M 2...M k ]M l = n[m 1 M 2...M k M l ] + n[m 1...M i...m k M l ] Consider M l is an annihilator, say a, then indeed all the normal-product with contraction appearing on the right hand side vanishes. And we get i=1 n[m 1 M 2...M k ]a = n[m 1 M 2...M k a] Since a is an annihilator, it can be taken inside the normal-product. We can thus assume that M l is a creator say b. Moreover, without any loss of generality, we can further assume that all the operators M i, i = 1,..., k are annihilators. Since one can easily extend this special case to a general case as follows. We simply multiply from the left both sides of L1 with the product of pertinent creation operator. These being to the left of all the operators may be brought inside of all the normal-products. Then we can add to the right hand side the terms in which these added creators are contracted, one by one, with the last operator in the product, b. All these terms vanish, since a contraction of two creators vanishes, and will not change the validity of our identity. Finally, we may rearrange the order of the operator M i, i = 1,..., k in each normal-product as desired. Moreover, any permutation of the operators M i, i = 1,..., k will not reverse the ordering of the contrated pairs, since the only contraction present is with the right most operator in the product, which is not affected by the permutation. It thus remains to prove L1 for the special case in which M 1,M 2,...,M k are annihilators, say a 1,a 2,...,a k, and M l is the creator b, i.e., n[a 1 a 2...a k ]b = n[a 1 a 2...a k b ] + k n[a 1...a i...a k b ] (33) We can now use induction, since above equation is clearly valid for k = 1, i=1 n[a 1 ]b = n[a 1 b ] + n[a 1 b ] a 1 b = b a 1 + δ ab and gives anticommutation relation. We suppose that Eq. (33) is valid for k = N 1 and prove that it is also valid for k = N

35 To do so wmultiply Eq. (33) by an arbitrary annihilator, say a 0, from the left. a 0 n[a 1 a 2...a N ]b = a 0 n[a 1 a 2...a N b ] + N a 0 n[a 1...a i...a N b ] (34) Consider now the left hand side of Eq. (34). Since all the operators in the normal-product are annihilators, we can bring a 0 in the normal-product. i=1 a 0 n[a 1 a 2...a N ]b = n[a 0 a 1 a 2...a N ]b (35) Similarly we can rewrite all the terms under the summation symbol on the right hand side of Eq. (34) obtaining N a 0 n[a 1...a i...a N b ] = i=1 N n[a 0 a 1...a i...a N b ] (36) i=1 Finally, the first term on the right hand side of Eq. (34) can be rearranged to the form a 0 n[a 1 a 2...a N b ] = ( 1) N a 0 b n[a 1 a 2...a N ] = ( 1) N (n[a 0 b ] + a 0 b ) n[a 1 a 2...a N ] (37) where in the last equation we have used definition of contraction. n[a 0 b ]n[a 1 a 2...a N ] = b a 0 a 1...a N = n[b a 0 a 1...a N ] a 0 b n[a 1 a 2...a N ] = n[a 0 b a 1 a 2...a N ] = ( 1) N n[a 0 a 1...a N b ] Substituting above two equations in Eq. (37) we get a 0 n[a 1 a 2...a N b ] = ( 1) N+1 n[b a 0 a 1...a N ] + ( 1) 2N n[a 0 a 1...a N b ] a 0 n[a 1 a 2...a N b ] = n[a 0 a 1...a N b ] + n[a 0 a 1...a N b ] (38) Now substituting Eqs. (35), (36), and (38) into Eq. (34), we finally get n[a 0 a 1 a 2...a N ]b = n[a 0 a 1...a N b ] + n[a 0 a 1...a N b ] + n[a 0 a 1 a 2...a N ]b = n[a 0 a 1...a N b ] + N n[a 0 a 1...a i...a N b ] i=1 N n[a 0 a 1...a i...a N b ] (39) We now generalize L1 to the case of the normal-products with contraction. L2: n[m 1 M 2...M i...m k ] M l = n[m 1 M 2...M i...m k M l ]+ 35 i=0 k j=1,j C n[m 1 M 2...M i...m j...m k M l ]

36 where C designates the index set of those operators M i, i = 1,...,k which are already contracted in the normal-product on the left hand side. Thus in the normal-product on the left hand side and in the first term on the right hand side only the operators M i, i C, are contracted, while in the last term there is an additional contraction involving the last operator M l and some yet unpaired operators M i, i = 1,...,k; i C. The proof of this lemma is very easy when one realizes that L2 reduces to L1 when C is empty. This is because all the contracted terms on left hand side, first term on the right hand side, and only the terms not contracted with M l in the second term on the right hand side can be taken out of the normal-product. We are now ready to prove Wick s theorem. We shall use again the mathematical induction, since from the definition of a contraction the theorem holds for N = 2. M 1 M 2 = n[m 1 M 2 ] + n[m 1 M 2 ] = n[m 1 M 2 ] + M 1 M 2 and, trivially for N = 1. We thus assume its validity for N 2 and prove that it is then also valid for N + 1. Indeed, multiplying Wick s theorem for N operators with an arbitrary creation or annihilation operator M N+1 from the right we obtain M 1 M 2...M N M N+1 = n[m 1 M 2...M N ]M N+1 + n[m 1...M i...m j...m N ]M N+1 1 i<j N ) n[m 1...M k 1 M k M k+1...m N ]M N+1 + (n[m 1...M N ]M N+1 where the last term (enclosed in paranthesis), in which all the operators inside the normal-product are contracted, is only present if N is even, while the last unbracketed term, in which all but one operator are paired in the normal-product, represents the last term when N is odd. Using now L1 to express the first term on the right hand side of Eq. (40) and, similarly, L2 for all the subsequent terms, we get M 1 M 2...M N M N+1 = n[m 1 M 2...M N M N+1 ] i<j N n[m 1...M i...m j...m N M N+1 ] N n[m 1...M i...m N M N+1 ] i=1 1 i<j N k i,j n[m 1...M i...m j...m k...m N...M N+1 ] n[m 1...M k 1 M k M k+1...m N M N+1 ] + n[m 1...M k 1 M k M k+1 M N...M N+1 ] + ( n[m 1...M N M N+1 ] The first two terms on the right hand side of Eq. (41) originates from the first term of the right hand side of Eq.(40), the third and fourth terms of Eq.(41) originates from the 36 ) (40) (41)

37 second term of Eq.(40). For N even, the terms in the last line contain all possible normalproducts with all but one operator contracted, since (N + 1) is then odd. Conversely, for odd N, Eq.(41) contains all possible fully contracted terms. Consequently, Wick s theorem also holds for (N + 1) operators and, thus, in general. 6. Particle-hole formalism Instead of referring all SDs and their matrix elements back to the vacuum state I = a 1 a 2...a N = a 1 a 2...a N vac it s more convenient to begin with a fixed reference state also called as Fermi vacuum, in contrast with the physical vacuum vac. and define other SD s relative to it, e.g. etc. Notice also that 0 Φ 0 = ijk...n Φ a i a i Φ 0 = ajk...n (single excitation), Φ ab ij a b ji Φ 0 = abk...n (double excitation), Φ i i Φ 0 = jk...n (electron removal), Φ a a Φ 0 = aijk...n (electron attachment) Φ ab ij = Φba ji = Φba ij = Φab ji The spinorbitals i,j,k,...,n are occupied in 0 are called hole states (they appear explicitly only when an electron is excited out of them by, e.g. i, creating a hole in the reference state), while the other spinorbitals a,b,... are called particle states. We shall use the letters i,j,k,... to indicate indices restricted to hole states, the letters a,b,c,... to indicate indices restricted to particle states and the letters p,q,r,... to indicate any state (either hole or particle, without restriction). We assume an energy level separating the filled (hole) states (present in 0 ) with the empty (particle) state. This energy level is called Fermi level. Using this notation, we find that i 0 = 0, a 0 = 0 0 i = 0, 0 a = 0 It is convenient to define a new set of operators, sometimes called pseudo-creation and pseudo-annihilation operators (or quasi-operators), via b i = a i, b a = a a, b i = a i b a = a a 37

38 Thus b i creates a vacancy in state i while b i eliminates such a vacancy. The particle pseudo-operators are identical to the ordinary particle operators, while the hole pseudocreation and pseudo-annihilation operators are equivalent to the ordinary hole annihilation and creation operators, respectively. The motivation for this notation is that all pseudo-annihilation operators operating to the right on the Fermi vacuum state give zero and all pseudocreation operators operating to the left on the Fermi vacuum state also give zero, b p 0 = 0, 0 b p = 0 7. Normal products and Wick s theorem relative to the Fermi vacuum Now we modify the concepts of normal products, contractions and Wick s theorem so that they relate to a reference state (the Fermi vacuum) instead of the physical vacuum. A product of creation and/or annihilation operators is said to be in normal order relative to the Fermi vacuum 0 Φ 0 = ijk...n if all pseudo-creation operators a,... and i,... are to the left of all pseudo-annihilation operators a,... and i,.... Using the notation b i = a i = i, b i = a i = i, b a = a a = a, b a = a a = a the product is in normal order if all the b p operators are to the left of all the b p operators. Since b p 0 = 0, 0 b p = 0 the Fermi-vacuum expectation value of a normal-ordered product of such operators vanishes. To distinguish the new type of normal product from the previous type, it is often written as N[ABC...] = ( 1) σ b pb q...b u b v, instead of n[abc...] when the ordering is relative to the physical vacuum. The power σ of the phase factor is the parity of the permutation from ABC... to b pb q...b u b v. Contractions relative to the Fermi vacuum will be denoted by brackets above the operators instead of below, and we have A B AB N[AB] So for contractions relative to the Fermi vacuum we find that the only nonzero contrations are i j = δ ij, ab = δ ab A normal product with contractions is also defined in the same way as in the case where it is relative to the physical vacuum: N[ABC...R...S...T...V...] = ( 1) σ RT SV...N[ABC...] 38

39 Quantity True vacuum formalism Fermi vacuum formalism vacuum state vac 0 or Φ 0 creation operator b a = a a = a b i = a i = i annihilation operator b a = a a = a b i = a i = i normal product of operators n[abc...] N[ABC..] If we recall the proof of Wick s theorem, we see immediately that the same proof will apply to particle-hole formalism versions of the theorem. We only have to replace everywhere the true vacuum quantities with the corresponding Fermi vacuum quantities, as indicated in the table given above. We can thus write immediately the particle-hole form of Wick s theorem as follows: ABCD... = N[ABCD...] + (All possible contractions) as indicated, the sum is over all possible contractions of one pair, two pairs etc. The operatore are particle-hole operators defined with respect to 0. Obviously, the usefulness of this theorem is at least partly due to the fact that the Fermi vacuum expectation value of a normal product vanishes unless it is fully contracted, so that 0 A...B...C...D... 0 = 0 N[A...B...C...D...] 0 where the sum is over all fully contracted normal products. From here on, unless explicitly stated otherwise, whenever we talk of the vacuum we will be referring to the Fermi vacuum and whenever we talk of normal products or contractions, we are referring to these concepts relative to the Fermi vacuum. 8. Generalized Wick s theorem To complete this phase of the analysis, we need one more theorem, the generalized Wick s theorem dealing with products of normal products of operators. This is needed since we shall have to evaluate matrix elements of the normal-product operator Ŵ between various Slater determinants (not just the reference SD), as for example in Φij... ab... Ŵ Φde... lm... = 0 i j...baŵ d e...ml 0 Here we have a vacuum expectation value of a product of three operator strings, each of which separately is in normal-product form, since N[i j...ba] = i j...ba, N[d e...ml] = d e...ml 39

40 The generalized Wick s theorem states that a general product of creation and annihilation operators in which some operator strings are already in normal-product form is given as the overall normal product of all the creation and annihilation operators plus the sum of all overall normal products with contractions except that, since contractions of pairs of operators that are already in normal order vanish, no contractions between pairs of operators within the same original normal product need be included: N[A 1 A 2...]N[B 1 B 2...]N[C 1 C 2...] = N[A 1 A 2...B 1 B 2...C 1 C 2...]+ N[(All possible contractions)] where the sum is over contractions of one pair at a time, two pairs, etc., and the prime on the summation sign indicates that no internal contractions. Note that the case in which the original product contains some individual creation or annihilation operators not within any normal product is also included in the scope of the generalized Wick s theorem, since for such operators A = N[A]. 9. Normal-product form of operators with respect to Fermi s vaccum One-electron operators: Let us consider a one-electron operator ˆF = p f ˆ q p q (42) pq Using Wick s theorem, p q = N[p q] + p q The contracted term vanishes unless p and q are the same hole state (call it i), when it is equal to 1, and thus ˆF = p f ˆ q N[p q] + i f ˆ i pq = ˆF N + i f ˆ i where ˆF N is the normal-product form of the operator Eq. (42), ˆF N = p f ˆ q N[p q] i pq The expectation value of ˆF N for Fermi vacuum is zero, i.e., 0 ˆF N 0 = 0. To show it, consider the four possible permutations of p q for particle and hole operators. Case I: Both operators correspond to particle states, then 0 N[a b] 0 = 0 a b 0 = 0 since there is no particle state to annihilate in 0. Case II: Both operators correspond to hole states, then 0 N[i j] 0 = 0 ji 0 = 0. Case III: One of the operators corresponds to a hole state and other to a a particle state, 40 i

41 such that 0 N[i a] 0 = 0 i a 0 = 0. Case IV: One of the operators corresponds to a particle state and the other one to a hole state, such that 0 N[a i] 0 = 0 a i 0 = 0 Φi a = 0. Therefore, we have 0 ˆF 0 = i f ˆ i, ˆF = ˆF N + 0 ˆF 0 i Note that ˆF N contains hole-hole, particle-particle, and hole-particle terms, ˆF N = f ij N[i j] + f ab N[a b] + f ia N[i a] + f ai N[a i] ij ab = f ij ji + f ab a b + f ia i a + f ai a i ij Two-electron operators: Next consider a two-electron operator, ab ia ia ai Ĝ = 1 pq ĝ rs p q sr (43) 2 pqrs The derivation of the normal-product for of this operator is left for homework. ai VI. DENSITY-FUNCTIONAL THEORY The solution of Schrödinger s equation in the clamped-nuclei approximation is a 3Ndimensional function (not including the spin degrees of freedom), where N is the number of electrons ĤΨ (x 1,x 2,...,x N ) = EΨ (x 1,x 2,...,x N ). Each wave function gives a unique electron density ρ(r) = N d 3 r 2...d 3 r N Ψ (x,x 2,...,x N ) 2. (44) s,s 2,...,s N Note that since Ψ is antisymmetric, it does not matter which particle is left out of the integrations. We see immediately that d 3 rρ(r) = N. The other commonly used symbol denoting ρ is n. Obviously, solutions of electronic structure problems would be much easier if one could replace Ψ by ρ, an object that is only three dimensional. While it may seem initially impossible, attempts to do so go back to the early days of quantum mechanics and we now know solid mathematical background for such an approach. Here is a summary of major historical developments in this field. 41

42 1927 Llewellyn H. Thomas proposes to apply the expressions coming from quantum statistical treatment of uniform electron gas (the latter can be found in most statistical mechanics textbooks) to atoms. While electron density in atoms is obviously non-uniform, Thomas assumed that it is uniform locally Enrico Fermi comes independently with a similar idea Paul Dirac extends theory to include the so-called exchange terms. Such approach is now called the Thomas-Fermi-Dirac (TFD) method. The essence of this theory is that the energy of a system is written as a functional of electron density with all terms in the functional originating from quantum statistical treatment of electron gas Carl Weizsäcker proposes a correction to the kinetic energy term in TFD John Slater develops a method which is a combination of the HF method and TFD, in particular, the density is computed from the Slater determinant and the method solves one-electron equations similar to HF equations. The Slater method was in many respects similar to the Kohn-Sham method discussed below, but was missing a rigorous derivation Pierre Hohenberg and Walter Kohn (HK) prove that there exists a functional of ρ which upon minimization gives the exact ground-state energy. The method is called the density-funtional theory (DFT). However, HK say nothing on how to construct such a functional Kohn and Lu Jeu Sham (KS) derive one-electron equations similar to HF equations that can be solved for spinorbitals which then give ρ which minimizes the density functional. The functional used in the original KS method has several terms taken from the TFD method and therefore this approach is now called the local-density approximation (LDA) Walter Kohn receives Nobel prize in chemistry for DFT Hundreds of approximation to the unknown exact density functional have been proposed by now and DFT is the most used computational method in many fields of physics and chemistry. A. Thomas-Fermi-Dirac method Although it is often stated that the Thomas-Fermi-Dirac (TFD) method originates from quantum statistical mechanics, no statistical approach is needed to derive this 42

43 method in its basic form. The reason is that the statistical treatment is taken for temperature T 0, when one can use non-statistical quantum mechanics. The Thomas-Fermi (TF) theory expressions come from considering a system of noniteracting spin 1/2 fermions of mass equal to the electrons mass placed in a cubic box. Then one adds the interelectron Coulomb interactions as a first-order correction neglecting at this point the permutational symmetry of wave function (like in the Hartree approach). The TFD extension fixes this deficiency, i.e., computes the first-order correction accounting for antisymmetry. Let us consider a system of N noniteracting spin 1/2 fermions of mass equal to the electrons mass m placed in a cubic box of side L (volume V = L 3 ). Since the Hamiltonian is separable into Hamitonians of individual particles, the solution of the Schödinger equation for such system reduces to solutions of single-particle equations, and then to separate solutions for each dimension, giving orbitals 8 ψ nx,n y,n z (r) = ψ x (x)ψ y (y)ψ z (z) = V sin(k xx)sin(k y y)sin(k x z) and orbital energies where ɛ nx,n y,n z = π2 2 ( ) n 2 2mL 2 x + n 2 y + n 2 z n x, n y, n z = 1,2,... k x = π L n x k y = π L n y k z = π L n z. We assumed that the box extends from 0 to L in each dimension and that the potential is zero inside the box and infinite outside, so that the boundary conditions are ψ x (0) = ψ x (L) = 0 and similarly for other dimensions. One may also assume periodic boundary conditions: ψ x (x) = ψ x (x + L) and similarly for other dimensions and this assumption leads to same results in the limit of large number of particles. We assume that each orbital corresponds to two spinorbitals, one with s = 1/2 and one with s = 1/2. The total wave function for the ground state of this system is then the Slater orbital built from N spinorbitals with lowest energies. This means that each orbital energy level is doubly occupied or doubly degenerate (the highest occupied energy level is called Fermi level). The total energy of the system is E = occ n x,n y,n z ɛ nx,n y,n z. (45) If we put a dot in three-dimensional coordinate system for each point n x,n y,n z, the part of the space with positive coordinates will be divided into cubes of side 1. For large N, the surface formed by largest such values, n x, n y, n z, limited by n 2 x + n 2 y + n 2 z r for some fixed, sufficiently large r, is well approximated by the surface of the sphere with the 43

44 radius r. The volume of the considered part of the space limited by this surface is πr3 so the number of states inside this surface is n r = 1 3 πr3, where we multiplied by 2 to include spin degeneracy. The number of states in a shell r,r + dr is therefore dn r = πr 2 dr. All these states have (approximately) the same orbital energies ɛ r = (π 2 2 /2mL 2 )r 2. Thus, we can obtain the total energy of the system by integrating E = ɛ r dn r = rf 0 π 2 2 2mL 2 r2 πr 2 dr = π3 2 10mL 2 r5 F where r F denoted the radius corresponding to the Fermi level. This radius can be found from N = 1 3 πr3 F which gives E = π3 2 ( ) 3N 5/3 10mL 2. π This energy can be further written in terms of electron (number) density ρ = N/V as E = π3 2 ( ) 3N 5/3 10m πl 3 L 3 = π3 2 ( ) 3ρ 5/3 V = C 10m π F V ρ 5/3 (46) where C F = 3 10 m (2π2 ) 2/3 is the so called Fermi constant. We will later use atomic units where this constant reduces to C F = 3 10 (2π2 ) 2/3. Note that this energy is just the the kinetic energy, the only energy in the case of noninteracting fermion gas. For future reference, let us find the expression for Fermi s energy and Fermi s wave vector. The Fermi energy is the orbital energy at r F ɛ F = π2 2 2mL 2 r2 F = π2 2 2mL 2 ( 3N π where the Fermi wave vector is ) 2/3 = π2 2 2m ( ) 3N 2/3 πl 3 = π2 2 ( ) 3ρ 2/3 (47) 2m π k F = ( ) 3ρ 1/3 2mɛ F = π = ( 3π 2 ρ ) 1/3 π (48) Thomas and Fermi used the expression of Eq. (46) as the kinetic energy in their model even if it was applied to atoms, molecules, or solids despite the fact that the electron density in such systems is obviously not constant. A critical assumption of TF model is that the density can be assumed locally constant, the so-called local density approximation (LDA). Next, Thomas and Fermi moved to interacting electron gas, i.e., added to the Hamiltonian the electron Coulomb repulsion term as a perturbation operator Û ee = 1 2 N i<j e 2 r i r j (49) 44

45 and included the first-order correction, i.e., the expectation value of Û ee = Ĝ with the product of ground-state spinorbitals in their energy expression. If we recall derivetion of Eq. (28), Ψ ĜΨ = 1 2 N ( ) gijij g ijji i,j=1 using just the product produces only the first term in this expression. It term can be written as 1 N g 2 ijij = 1 N d 3 r 2 1 d 3 r 2 φi (x 1)φj (x e 2 2) r 1 r 2 φ i(x 1 )φ j (x 2 ) (51) i,j=1 i,j=1 s 1,s 2 The sum over spinorbitals can be replaced by electron density. To see it, let us write Eq. (44) for Slater determinant. Due to orthonormality of spinorbitals, the spinorbitals of coordinates integrated over must be the same. Thus the only surviving terms are those where the consecutive squares of modulus of a given spinorbital are depending on x (N 1)! N N N ρ(r) = N ( φ N!) 2 i (x)φ i(x) = φi (x)φ i(x) = 2 ψi (r)ψ i(r) (52) s i=1 where 1/ N! comes from the definition of the determinant and the factor (N 1)! is the number of permutations (identical in the bra and ket) of spinorbitals other than φ i (x). In the last step, we have integrated over spin recalling that pairs of spinorbitals are related to the same orbital. Using Eq. (52) we can write Eq. (51) as 1 2 s e 2 i=1 i=1 (50) d 3 r 1 d 3 r 2 ρ(r 1 ) r 1 r 2 ρ(r 2) = J H [ρ]. (53) This term is known under the name of Hartree energy and denoted by J H [ρ]. It described Coulombic interaction of electron density with itself. For atoms, moleccules, and solids, Thomas and Fermi included, of course, also the Coulomb interaction of electron with nuclei N nuc ˆV = a=1 N i=1 Z a e 2 N r i R a = i=1 ˆv(r i ). (54) Since this is a one-electron operators, its expectation value with the ground-state Slater determinant N Ψ ˆV Ψ = v ii = ρ(r) ˆv(r) = V [ρ]. i=1 Since we now have different kinetic energy at teach point of space, we have to average the expression (46) over the space, obtaining in this way the kinetic energy of the TF method T TF = 1 Ed 3 r = C V F ρ 5/3 (r) 45

46 The total energy expression in the TF metod is therefore T TF [ρ] = T TF [ρ] + V [ρ] + J H [ρ]. This functional of ρ can be minimized with respect to ρ, we will not discuss these methods. For atoms, the functional has often been evaluated with densities obtained from the HF method. The TFD method is an extension of the TF method by including the permutational symmetry in evaluating the expectation value of the U ee operator, the term sometimes called the Dirac exchange energy. In contrast to the Hartree term which is valid for any set of orbitals, the Dirac term is explicitly computed with orbitals of the noninteracting gas. The exchange integral of HF theory can be written in terms of one-particle density matrix K = 1 2 N g ijji = 1 2 i,j N d 3 r 1 d 3 r 2 φi (x 1)φj (x e 2 2) r 1 r 2 φ j(x 1 )φ i (x 2 ). i,j=1 s 1,s 2 Let s first sum over spin. We have s 1 ψi (1)σ(1)ψ j (1)σ (1) s 2 ψ i (2)σ(2)ψj (2)σ (2) where i and j are orbital indices. Note that i is coupled with the same σ in both places as it is the same spinorbital. Thus, if in the sum over s 1 we have, say, + combination of spins, the same combination apears in the sum over s 2. Therefore, the only nonvanishing terms are ++ and, so we get an overal factor of 2 from spin summation and we can write N/2 d 3 r 1 d 3 r 2 ψi (r e 2 N/2 1)ψ i (r 2 ) r i 1 r 2 ψ i (r 1 )ψi (r 2) i = 1 d 3 r 4 1 d 3 e 2 r 2 ρ 1 (r 1,r 2 ) r 1 r 2 ρ 1(r 2,r 1 ) = 1 d 3 r 4 1 d 3 r ρ1 2 (r 1,r 2 ) 2 e 2 r 1 r 2 K = The quantity ρ 1 is the one-electron (reduced, i.e., integrated over spin) density matrix N/2 ρ 1 (r 1,r 2 ) = 2 ψ i (r 1 )ψi (r 2) (55) i and the factor 2 in its definition leads to the factor 1/4 in the expression for K. We will now compute the one-electron density matrix for noniteracting electron gas. In contrast to what we did when deriving the kinetic energy expression, it is now more 46

47 convenient to assume the periodic boundary conditions. One can prove that for large N the two conditions give the same answers, but we will not do it since at this point we made much more drastic approximations then could arise from a possible inconsistency resulting from different boundary conditions. With the periodic conditions, the orbital wave functions is of the form ψ nx,n y,n z = 1 V e ik r where k v = 2π L n v with n v = 0,±1,±2,... The density matrix of this system is ρ 1 (r 1,r 2 ) = 2 V occ n x,n y,n z e ik (r 1 r 2). Analogously as before, for large N we can change summation to integration ρ 1 (r 1,r 2 ) = 2 V occ e ik r 12 dn x dn y dn z = 1 k =kf e ik r 12 4π 3 d 3 k where changing variables with used dk x = (2π/L)dn x and so on, which gives overall Jacobian V /(8π 3 ). The upper limit of the integration was defined in Eq. (48). This integral can be evaluated in spherical coordinates and this evaluation is given as a homework. The result is ρ 1 (r 1,r 2 ) = 1 π 2 s 3 [sin(sk F) sk F cos(sk F )] (56) where s = r 12. It is now natural to view ρ 1 as a function of variables s = r 1 r 2 and r = (r 1 + r 2 )/2. We see that it is independent of r and of the direction of s, as expected for the uniform gas. Notice that ρ 1 does depend on ρ via k F. To compute the Dirac exchange energy used in TFD, we assume as before that the uniform gas expression is valid locally and with ρ dependent on r averge over space, obtaining K D = C x ρ 4/3 (r)d 3 r, C x = 3 4 The derivation of this expression is left as homework. ( 3 π ) 1/3. (57) B. Hohenberg-Kohn theorems The first HK theorem states that, for the ground state of a system, the knowledge of ρ allows one to determine Ψ and vice verse. The latter is obvious from the definition of density. For systems consisting of atoms with no external potentials, the proof of the former part of this theorem is very simple. Since the sources of the potential are nuclei and the electron interaction with a nucleus is singular, the density will have sharp peaks 47

48 exactly at the positions of nuclei. The steepness depends on the charge of a nucleus. Thus, the knowledge of density gives us locations and charges of nuclei. Therefore, one can write Schrödinger s equation and solve it to find Ψ. The importance of this theorem is that is shows that the density alone gives all the needed information about the system. Also note that the theorem discusses only the exact ground-state density. The second HK theorem states that there exists a functional of density, denoted by E[ρ], that upon minimization with respect to ρ gives the ground-state energy E[ρ] E[ρ 0 ] = E 0 where ρ(r) > 0 and d 3 rρ(r) = N. The density is arbitrary except for satisfying the two conditions listed, originating from the definition of ρ in Eq. (44). To prove this theorem, we will follow the arguments given by Levy. We will start from the Ritz variational principle E 0 = min Ψ ĤΨ Ψ where Ψ belongs to the Hilbert space of normalized antisymmetric N-electron functions. We can then write [ ] E 0 = min Ψ ĤΨ = min Ψ ρ min Ψ ĤΨ Ψ ρ with ρ constrained by the conditions specified in the theorem. The meaning of the double minimizations is as follows. We go over the space of all possible ρ s and for a given ρ find all Ψ s that give this ρ. We select such Ψ out of this set that gives the lowest expectation value of the Hamiltonian. Clearly, if we go over all ρ s, we will eventually find the ground-state energy, which proves the theorem. One subtlety to discuss is whether for an arbitrary ρ there exists an antisymmetric Ψ which gives this ρ via Eq. (44). One can indeed prove that this is the case (by constructing a set of orthonormal spinorbitals from the density and then constructing a Slater determinant from the density). We say that each ρ (fulfilling the constraints) is N-representable. However, we do not need to prove this theorem to complete the proof of the second HK theorem. Since for each Ψ there exists a ρ, we will sweep the space of all Ψ s when going over all ρ s. If there were ρ s that are not N-representable (which is not the case), we could just ignore them. The importance of Hohenberg-Kohn work is mainly conceptual, stemming from the fact that it has put density-functional theory on a solid mathematical ground, in contrast to the TFD and Slater methods which both were based on ad hoc arguments. However, the HK theorems did not offer any new practical tools. The proof that the functional exists is via the wave functions, so it tells us nothing about finding the actual functional that could be applied without invoking wave functions. Significant efforts have been made by many researchers to find good approximations to such a functional, but in fact all the proposed pure density functionals work poorly. The family of methods that do 48

49 work, now known under the name DFT, are in fact not true DFT approaches, i.e., are not based on density alone. These methods originate from the Kohn-Sham ideas which will be discussed next and use spinorbitals in addition to densities. C. Kohn-Sham method Let us write the Hamiltonian of Eq. (19) with the following notation for the three consecutive terms Ĥ = ˆT + ˆV + Û ee. The matrix elements of the multiplicative operator ˆV which is the sum of one-electron terms N nuc N Z ˆV = a e 2 N r i R a = ˆv(r i ) a=1 i=1 can be written as an explicit functional of density Ψ ˆV Ψ = N s 1,...,s N i=1 d 3 r 1...d 3 r N ˆv(r 1 ) Ψ (x 1,x 2,...,x N ) 2 = d 3 r 1 ˆv(r 1 )ρ(r 1 ). (58) We can therefore write HK theorem as [ E 0 = min ρ d 3 r ˆv(r)ρ(r) + F HK [ρ] ] where F HK [ρ] = T HK [ρ] + Uee HK [ρ] = min Ψ ( ) ˆT + Û ee Ψ = Ψmin [ρ] ( ) ˆT + Û ee Ψmin [ρ]. Ψ ρ There is nothing new in this equation except for introducing the notation. One might think that an explicit density functional can be obtained when the Û ee operator is neglected. This is not so since the operator ˆT is a differential operator. Thus, if we replace ˆV by ˆT in Eq. (58), this equation cannot be integrated to depend on density only. To overcome this difficulty, Kohn and Sham replaced T HK [ρ] by an expression used in the HF method, i.e., by the expectation value of ˆT with a Slater determinant T HK [ρ] T S [{φ i [ρ]}] = 1 2 N φ i 2 φ i i=1 where we started to use atomic units such that = e = m e = 1. The notation introduced in T S indicates that spinorbitals can be considered to be determined by density. We may say that T S [{φ i [ρ]}] is an explicit functional of orbitals and an implicit functional of density 49

50 (we will not make use of these concepts). The density can be calculated from a Slater determinant using Slater-Condon s rules for the following electron-density operator i.e., ρ(r) = Ψ ˆρΨ = i=1 s 1 ˆρ = N δ(r r i ), i=1 N d 3 r 1 φi (x 1)δ(r r 1 )φ i (x 1 ) = N φ i (x) N 2 = φi (r) 2, (59) where we replaced s 1 by s in the next to last equation, φi is the orbital part of the spinorbital φ i, and where we assumed pure spin states, so that the sum over the spin part is one. So far we do not know how to determine the spinorbitals, we will get to this issue later on. The next important idea of Kohn and Sham was to write the Uee HK [ρ] term as a sum of the Coulomb interaction of the density with itself E H [ρ] = 1 d 3 rd 3 r ρ(r)ρ(r ) 2 r r, called the Hartree energy, and of the remainder. This term appears in the HF theory as the expectation value of the N-electron Coulomb operator and in the THD theory, so this choice was natural. With both discussed approximations, the F HK [ρ] functional can be written as i=1 F HK [ρ] = T S [{φ i [ρ]}] + E H [ρ] + E xc [ρ] where the last term, called the exchange-correlation energy, collects all interactions not included in the first two terms. It is worth to write this term explicitly E xc [ρ] = T HK [ρ] T S [{φ i [ρ]}] + Uee HK [ρ] E H [ρ]. Thus, despite the label exchange-correlation", this term includes kinetic energy corrections. The term is expected to correct for the electron correlation effects not included in E H [ρ] and for the effects resulting from antisymmetrization of the Slater determinant which in the HF method lead to the exchange operator. When a concrete E xc [ρ] is constructed, one usually considers separately the correlation and exchange components, denoted by E c [ρ] and E x [ρ], respectively. All the hundreds of DFT methods in use differ by the selection of E xc. In the simplest case used in the original KS paper, this term is taken from the TFD theory. We will discuss various choices of E xc later on. The complete KS functional can be written as E0 KS [ρ] = T S[{φ i [ρ]}] + d 3 r ˆv(r)ρ(r) + E H [ρ] + E xc [ρ] s i=1 50

51 We will find its minimum in a way analogous to that used in the derivation of the HF method. Since ρ is expressed in terms of φ i via Eq. (59), variation of ρ will be expressed via variations of φ i s. Thus, we will vary, as in the HF method, φ i φ i + δφ i and this variation will imply the variation of ρ N ( N ρ(r) ρ(r) + δρ(r) = φ i (x) + δφi (x)) φ i (x) = ρ(r) + φ i (x)δφi (x) s i=1 We have to now impose the two conditions on ρ. The positiveness condition is automatically satisfied if the definition (59) is used. The normalization to N will be achieved if each orbital is normalized. We will in addition require that orbitals are orthogonal to each other since only then Eq. (59) holds. Thus, the conditions will be imposed in exactly the same way as in the HF method, i.e., we will minimize N L KS [ρ] = E0 KS [ρ] ( ) λ ij φi φ j δ ij j i The linear variations of the kinetic energy, the nuclear attraction, the Hartree, and the constraints terms are exactly the same as in the HF method δ δ δt S = 1 2 N δφ i 2 φ i i=1 d 3 r ˆv(r)ρ(r) = δe H [ρ] = N δφ i ˆvφ i i=1 N δφ i Ĵφ i i=1 N ( ) N λ ij φi φ j δ ij = λ ij δφ i φ j. j i For the exchange-correlation (xc) energy term, we have to use symbolic notation since we do not know this term explicitly. This term is assumed in the form of an integral of the so-called xc energy density E xc [ρ] = d 3 r e xc [ρ](r) where e xc (r) is some function of ρ(r), in the simplest cases it can be just a power of ρ. Therefore, we can write E xc [ρ + δρ] = E xc [ρ] + δe xc [ρ] + O[(δρ) 2 ] = E xc [ρ] + d 3 r e xc(r)δρ(r) + O[(δρ) 2 ] j i s i=1 51

52 where e xc(r) is defined as the function which integrated with δρ(r) gives δe xc [ρ]. Thus, this is an analog of the standard derivative: f (x + δx) = f (x) + f (x)δx + O((δx) 2 ). We will use notation e xc(r) δe xc δρ (r) v xc(r) and we call v xc (r) the functional derivative of E xc [ρ]. Notice that v xc (r) is a function of r, whereas E xc [ρ] is just a single real number for a given ρ. Although this definition may appear to be abstract, it is simple to find v xc in practice. For example, the exchange energy density in TFD is e x (r) = C x ρ(r) 4/3 so that E x [ρ] = C x d 3 r ρ(r) 4/3 E x [ρ + δρ] = C x d 3 r (ρ(r) + δρ(r)) 4/3 = C x ( d 3 r ρ(r) 4/3 + 4 ) 3 ρ(r)1/3 δρ(r) +..., where we applied Taylor s expansion of f (x) = x 4/3 (true of any value ρ(r) = x). Thus, v x (r) = 4 3 ρ(r)1/3 in this case. One may also comment that the familiar derivation of Euler- Lagrange s equations in classical mechanics uses concepts analogous to those defined above and can also be formulated using functional derivatives. The derivation of a formula for v xc (r) gets a bit more complicated if the exchangecorrelation energy depends also on derivatives of ρ, as it will be discussed later on. In this case, we have E xc [ρ] = d 3 r e xc [ρ, ρ](r). The quantity e xc [ρ, ρ](r) is some concrete function of ρ and of ρ i ρ/ x i. Although the derivatives ρ i are defined by ρ, we can first treat them as independent variables [like in df (x,y(x)) = ( f / x)dx + ( f / x)dy = ( f / x)dx + ( f / x)(dy/dx)dx]. At a given point r and for a given ρ, the linear variation of e xc [ρ, ρ](r) is the sum of the increments δρ and δρ i s multiplied by the regular partial derivatives of e xc with respect to these variables δe xc = e xc [ρ + δρ,ρ 1 + δρ 1,ρ 2 + δρ 2,ρ 3 + δρ 3 ] e xc [ρ, ρ] O[(δρ) 2 ] = e xc ρ δρ + e xc ρ 1 δρ 1 + e xc ρ 2 δρ 2 + e xc ρ 3 δρ 3. We now have to eliminate the increments δρ i s in favor of δρ, similarly as in the derivation of the Euler-Lagrange equations. This can be done integrating by parts: d 3 r e xc δρ ρ i = d 3 r e xc δρ = e xc δρ i ρ i x i ρ i d 3 r ( ) exc δρ. x i ρ i The surface term vanishes since δρ vanishes at infinity and we eventually have d 3 r v xc (r)δρ = d 3 e r xc ρ e xc x i ρ i δρ. 52 i

53 Using the definition of the functional derivative, we can write the linear variation of xc energy in terms of variations of orbitals as N δe xc [ρ] = d 3 r v xc (r)δρ(r) = d 3 r v xc (r)φ i (x)δφi (x). Now we have determined variations of all terms in L KS [ρ]. Assuming that all variations of spinorbitals are zero except for spinorbital φi (or, alternatively, noticing that all variations are independent), we get δφ i ( 1 ) ˆv + Ĵ + v xc φ i λ ij δφ i φ j = 0. As in the HF case, it implies that the ket has to be identically equal to zero ( 1 ) ˆv(r) + Ĵ(r) + v xc (r) φ i (x) = λ ij φ j (x) Since ρ is obviously invariant to unitary transformation of orbitals, so is v xc (r) and we can diagonalize the matrix by a unitary transformation, obtaining the canonical KS equations ( 1 ) ˆv(r) + Ĵ(r) + v xc (r) φ i (x) = ɛφ i (x). The KS equation are similar to HF equations. The major difference is the presence of the v xc functional and the absence of the ˆK operator. One might think that it would be a good idea to include ˆK, but the predictions of such methods are poor. However, there is a whole family of density functionals that add a fraction of ˆK at the same time subtracting an equivalent part of v x. Such approaches are called hybrid DFT methods. Another major difference between HF and KS approaches is that in the former case one computes the total system energy as an expectation value of the Hamiltonian, whereas in the latter case one uses the appropriate KS functional. Of course, this is consistent with the way each type of equations was obtained. s i=1 j j D. Local density approximation The exchange-correlation energy is defined in the KS method as E xc = T + U ee T S E H where T is the exact kinetic energy and U ee is the exact electron repulsion energy. Thus, despite its name, it should correct also for the deficiencies in the description of kinetic energy by T S. However, little work is done in this direction, major efforts in the field have been directed into improving the description of components resulting from electron 53

54 correlation and electron exchanges (however, in the so-called meta-gga theories that will be discussed later, one used terms related to kinetic energies). The electron repulsion term is partly accounted in KS approach by the E H term, Coulomb repulsion of density with itself. This term is the same as in the HF method, except that it is computed with KS rather than HF densities. In the HF approach, this term does not include any correlation effect (if electron correlation energy is defined as E corr = E exact E HF ). Thus, we do not expect it to completely describe electron correlation in the KS approach. We need further contributions. The need for some term related to electron exchanges is clear when we realize that the exchange operator of the HF method is missing in KS orbital equation. It would be simple to add this operator and indeed the so-called hybrid methods which will be discussed alter on do it. However, just adding the complete ˆK was found to give poor results. The truth is that the meaning of the words "exchange" and "correlation" in relation to E xc should be taken in only a loose sense. Nevertheless, there is a rich literature devoted to constructing E xc functionals and most of this work is based on solid physics. One often discusses separately the two terms writing E xc = E c + E x So, how one gets any expression for E xc? The preceding discussion might indicate that this is an impossible task. However, as usual in physics, one studies simple, exactly solvable models and tries to design expression which work for such models. One of the most important model is homogeneous interacting electron gas (HIEG). As discussed in Sec.??, the TFD model is derived from this physical system. This also means that if the TFD model is applied to HIEG, it is expected to work very well, and this is indeed the case. The KS approach also uses the local density approximation (LDA) as does TFD. In fact, the original KS model in the 1965 is often called the LDA approach. It is similar to TFD in several respects. First, the terms V and E H are identical. Next, Kohn and Sham have also taken the E x from TFD: E x = K D where K D is defined by Eq. (57). Thus the main difference is the use of T S instead of T TF, which was the main feature of Slater s density functional theory. The E c was usually set to zero in early LDA variants, although there were some attempts to approximate this term based on perturbation theory of interacting electron gas. Note that while the noninteracting electron gas problem can be solved exactly, no analytic solutions exist for the interaction gas except in the limits of very large and very small densities. The use of LDA may seem as a huge approximation for atoms molecules and solids. Indeed, in its initial form KS/LDA did not work well for molecules, in many cases predicting that well-known molecules are not bound. The greatest successes of this approach were for metal, where the conduction electrons resemble electron gas. 54

55 The LDA method is still widely used. The main difference between the modern LDA and the original KS version is the addition of an E c term fitted to nearly exact numerical calculations for interacting uniform electron gas performed by Ceperley and Alder in The calculations were performed using the diffusions Monte Carlo (DMC) method which will be discussed later on. The only quantity obtained by Ceperley and Alder was the total energy of the electron gas as function of density. One can then arbitrarily define the correlation energy as E c [ρ] = E total [ρ] T S [ρ] E D [ρ]. (60) These numerical results were then fitted by some simple analytic functions. One may notice that there is no V and E H terms in this equation. The reason is that for the interacting uniform electron gas, one has to use a uniform positive background to compensate for the electron charges and the Coulomb interactions included in these two terms add up to zero. Notice further the arbitrariness of definition (60): the two subtracted terms are not the exact kinetic and exchange energies of the system but some approximations of these quantities. The fits of E c are usually expressed in terms of the quantity r S called the Wigner-Seitz radius and defined by 4 3 πr3 S = 1 ρ = V N i.e., it is the radius of a sphere with volume corresponding to the volume occupied by one electron. E c is usually expressed as E c = ρ(r)ɛ c (r)d 3 r where ɛ c (r) is called correlation energy density. Several fits of this quantitiy have been published, a particularly simple one was developed by Chachiyo in 2016 ɛ c = aln ( 1 + b + b ) r s rs 2 where a and b are fit parameters. Other fits have been published by Vosko-Wilk-Nussair (VWN) in 1981 and by Perdew and Wang (PW92) in E. Generalized gradient approximations (GGA) LDA has been developed using uniform electron gas as the underlying model, a system where ρ(r) = 0, but applying the resulting formulas to systems where ρ(r) 0. It is therefore natural to include ρ(r) in DFT. Attempts to do so have a long history: the Weizsäcker correction to the TF kinetic energy is the earliest example. The expansion 55

56 in powers of ρ(r) was discussed in the HK 1964 paper. This expansion, now called gradient expansion approximation (GEA), is usually expressed in terms of the quantity or in terms of s(r) = ρ(r) 2k F ρ(r) = x(r) = ρ(r) ρ(r) 4/3. ρ(r) 2(3π 2 ) 1/3 ρ(r) 4/3 The exchange energy can then be written as E x = C x ρ 4/3 (r) ( 1 + D x s 2 (r) ) d 3 r = C x ρ 4/3 (r)f x (s)d 3 r where F x (s) is called the enhancement factor. Notice that this expression does reduce to the uniform gas limit for ρ(r) = 0. Note also that there are no terms linear in components of ρ(r). The reason is that E x is assumed to be a quantity independent of the external potential, i.e., only dependent on electron-electron interactions. Thus, this quantity should be invariant under rotations. The coefficient D x can be determined by requesting that E x satisfies some exact constraints or by fitting to experimental data or to results from wave function based calculations. We will not discuss these issues since calculations applying GEA have demonstrated that it gives poor results, usually worse than LDA. The underlying reasons is that GEA violates several exact conditions that LDA does satisfy (again, we will not discuss these issue). The problems of GEA were solved by a family of methods known under the name generalized gradient approximations (GGA). To explain heuristically the main idea of these solutions let us consider the exchange hole defined as ρ x (r 1,r 2 ) = 1 ρ 1 (r 1,r 2 ) 2 2 ρ(r 1 where ρ 1 has been defined by Eq. (55). Numerical GEA calculations show that while this quantity is well reproduced by GEA for small r 1 r 2, at some range of intermediate values of interelectron separation the values of the exchange hole are much too large. One simple solution is to cut-off this region in numerical calculations. However, the unphysical behaviour is clearly related to the enhancement factor F x (s) increasing too fast. Thus, the GGA methods use some fnctions multiplying s 2 that damp this growth. One of the popular GGA enhancement factors was introduced by Becke in 1988 F x (x) = 1 + x 2 β 1 + 6βx sinh 1 (x) with β = fitted to reproduce atomic HF energies. One can plot this function to see that it increases slower than x 2 for large x. GGAs perform much than LDA for virtually all systems and are now the mainstream DFT methods. 56

57 F. Beyond GGA To build on success of GGA, one can think about including further terms in GEA, starting from the s 4 term. This leads to a family of methods called meta-gga which use also terms dependent on the so-called spinorbital kinetic energy density τ(r) = 1 2 s N φ i (x) 2. i=1 Another extension of GGA approach is the inclusion of the HF exchange operator in the KS one-electron equations, which leads to a family of the so-called hybrid GGA methods. The HF exchange is, of course, calculated with KS orbitals, so it is different from the actual HF exchange. This term is often called exact exchange, but of course it is not exact. The HF exchange operator is multiplied by some fractional number α and exchange potential v x is then multiplied by 1 α. There are also methods called rangeseparated hybrid (RSH) methods which admix the HF exchange at a variable amount depending on the value of r 1 r 2 in the exchange integral. The whole family of DFT method is sometimes visualized in the form of the so-called Jacob ladder proposed by Perdew. The consecutive rungs are virtual (e.g., RPA) hybrid: HF exchange metagga: ρ 2, τ GGA: ρ LDA: ρ(r) The last rung are theories that use virtual orbitals. As discussed earlier, the KS method is not a true DFT method since it uses orbitals. However, all the rungs but the top one use only the occupied orbitals. Although the use of orbitals is numerically more costly than the use of density only, the restriction to occupied orbitals makes the cost increase manageable and even hybrid metagga methods are numerically much less expensive than even the simplest wave function methods above the HF level. However, it is possible to use KS both occupied and virtual orbitals in many-body methods such as those discussed in the next sections. One of the simples ones is the random-phase approximation (RPA) which can be viewed as a special case of the coupled cluster method with double excitations (CCD) [see Sec. IX C]. Of course, the costs of such approaches as the same as costs of the corresponding wave function approaches. One may ask why to use DFT in such cases. One reason is that the unperturbed problem may be closer to the exact solution than in the case when HF is used as zeroth-order approximation. This is the case in particular for metals where HF work poorly. 57

58 Let us now present a very breif list of most popular functionals (the total number of DFT functionals proposed is a few hundred). LDA nonempirical Kohn and Sham 1965 solids BLYP nonempirical Becke, Lee, Yang, and Parr 1988 molecules PBE nonempirical Perdew, Burke, and Ernzerhof 1996 molecules and solids SCAN nonempirical Perdew et al molecules and solids B3LYP fitted Becke 1993 molecules PBE0 fitted Adamo and Barone 1999 molecules M06-2X fitted Truhlar et al molecules The second and third functional belong to the GGA rung, the fourth is metagga, the fifth and sixth are hybrid GGAs, and the last one is a hybrid metagga. The first four functionals can be classified as nonempirical, i.e., the parameters were fixed mostly using various exact conditions, possibly with some fitting to a limited set of data such as atomic total energies. In the fitted functionals, the parameters were adjusted by fitting DFT predictions to a set of benchmark date obtained both from experiments and from accurate calculations using wave function methods. VII. VARIATIONAL METHOD The variation principle is an approximation method that provides a simple way of placing an upper bound on the ground state energy of any quantum system. We start with the inequality E = ψ H ψ E ψ ψ 0 (61) where E is the expected energy, ψ is an arbitrary state and E 0 is the lowest eigenvalue of the Hamiltonian, H. The proof of the proposed claim is as follows. First, let s assume that the arbitrary state is normalized, i.e. ψ ψ = 1. If we expand ψ = n c n n with n c n 2 = 1 to ensure normalization, then we can write for the expected energy E = cmc n m H n = cmc n E n δ mn = c n 2 E n n,m=0 n=0 n,m=0 n=0 E = E 0 c n 2 + c n 2 (E n E 0 ) E 0 n=0 In the case of a non-degenerate ground state, we have an equality only if c 0 = 1, which implies that c n = 0 for all n 0. If we consider a family of states ψ(α), which depend on some number of parameters α i, we can define E(α) = ψ(α) H ψ(α) ψ(α) ψ(α) 58 E 0

59 Here, we still have the relation E(α) E 0 for all parameters α. The lowest upper bound on the ground state energy is then obtained from the minimum value of E(α) over the range of parameters α, i.e. obtained by taking the first derivative E α i α=αk = 0 giving us the upper bound E 0 E(α k ). Unfortunately, the variational method does not tell us how far above the ground state E(α k ) lies. Despite the limitation, when a set of states ψ(α) is chosen fairly close to the ground state, the variational method can give remarkably accurate results. A. Configuration Interaction (CI) method The basic idea of Configuration Interaction (CI) is to diagonalize the N-electron Hamiltonian in a basis of N-electron functions, or Slater determinants. Essentially what we re doing here is representing the exact wave function as a linear combination of N- electron trial functions and then using the variational method to minimize the energy. If a complete basis were used, we would obtain the exact energies to both the ground state and all excited states of the system. In principle, this provides an exact solution to the many-electron problem; however, in practice, only a finite set of N-electron trial functions are manageable so the CI wavefunction expansion is typically truncated at specific excited configurations. As a result of the size restrictions on practical CI calculations, CI often provides only upper bounds to the exact energies. The CI wavefunction is a linear combination of known Slater determinants Φ i with unknown coefficients. This allows us to write eigenvectors of our Hamiltonian as Ψ j = c ij Φ i i Generally, the Slater determinants are constructed from excitations of the Hartree- Fock "reference" determinant Φ 0. Ψ = c 0 Φ 0 + ca Φ r a r + cab rs Φrs ab + cabc rst Φrst abc +... (62) r,a r<s,a<b r<s<t,a<b<c where, Φa r represents the singly excited Slater determinant formed by replacing spinorbital φ a with φ r. Similarly, Φab rs represents the doubly excited Slater determinant formed by replacing spinorbital φ a with φ r and replacing spinorbital φ b with φ s, and so on for higher excited states. Every N-electron Slater determinant can be formed by a set of N spinorbitals, {φ i } N i=1. We can rewrite Eq. 62 in a more general form Ψ CI = i=0 c i Φ i, where i = 0 refers to our reference Hartree-Fock wavefunction, i = 1 refers to our singly excited 59

60 state wavefunction and so on. We now optimize our total CI wavefunction via the Ritz variational method. E = Ψ CI Ĥ Ψ CI Ψ CI Ψ CI If we then expand the CI wavefunction in a linear combination of our Slater determinants, we get i j E = c i c j Φ i Ĥ Φ j i j c i c j Φ i Φ j The variational procedure corresponds to setting all the derivatives of our energy with respect to the expansion coefficients c i equal to zero. Rearranging, we get E ci c j Φ i Φ j = ci c j Φ i Ĥ Φ j i j E ci c c j Φ i Φ j + 2E c i Φ i Φ j = 2 c i Φ i Ĥ Φ j + ci c ( j Φi Ĥ Φ i c j ) i ij i The first term vanishes from the minimization of the energy, and the last term vanishes since it doesn t depend on the coefficients. Since the basis functions are orthonormal, we obtain E c i δ ij = c i Φ i Ĥ Φ j i H ij c i Eδ ij c i = 0 i where H ij = Φ i Ĥ Φ j. Since there is one equation for each j, we can transform this equation into a matrix equation. (H EI)c = 0 Hc = Ec H 00 E H H 0j... H 10 H 11 E... H 1j... c H j0.... H jj E Solving these secular equations is equivalent to diagonalizing the CI matrix. The CI energy is then obtained as the lowest eigenvalue of the CI matrix, and the corresponding eigenvectors contain the c i coefficients in front of the determinants in Eq. 62. In this case, the second lowest eigenvalue corresponds to the first excited state, the third lowest is the second excited state and so on. We have mentioned that the CI expansion is typically truncated at specific excited configurations. From studying the Slater-Condon rules, we know that only singly and i i 60 i i j c 0 c j. = ij (63)

61 doubly excited states can interact directly with the reference state, therefore matrix elements that have more than three unlike spinorbitals vanish. Due to Brillouin s theorem, the matrix elements S H Φ 0 are zero. The structure of the CI matrix, under the basis set of HF Slater determinants and their excited states is then given as H = Φ 0 S D T Q. Ψ 0 H Ψ 0 0 D H Ψ S H S S H D S H T 0... D H Ψ 0 D H S D H D D H T D H Q... 0 T H S T H D T H T T H Q Q H D Q H T Q H Q..... where Φ 0 is the Hartree-Fock reference state, S is the singly excited state, D is the doubly excited state and so on. The blocks X H Y which are not necessarily zero may still be sparse, meaning that most of its elements are zero. Let s look at the matrix element belonging to the block D H Q. The matrix elements Φab rs H Φtuvw cdef will be nonzero only if φ a and φ b are contained in the set {φ c,φ d,φ e,φ f }, and if φ r and φ s are contained in the set {φ t,φ u,φ v,φ w }. The task at hand is then to calculate each matrix element and to diagonalize the CI matrix. As we include more and more excitations in the CI expansion, we capture more and more electron correlation. CI needs more basis sets in order to capture the correlation energy efficiently. We can increase the size of the CI matrix by adding more excited configurations, or by increasing the basis set size. However, there s a problem with adding more and more excitations or basis sets - namely, it is very expensive to do so. If the number of spinorbitals produced by HF is 2M, the number of determinants constructed is then ( 2M) N, where N is the number of electrons. Taking into account all possible excitations in the expansion is known as Full CI (FCI), and this method goes with a complexity of O(N!). Because of the complexity of Full CI, what is usually done is to take advantage of lower excitation states and truncate the CI matrix, i.e. CI Doubles (CID) only takes into consideration CI with double excitations. Since the single excitations themselves do not correlate with the ground state explicitly, the most significant term for the correlation energy must come from the double excitations, since they are the first excitations coupled with the HF Slater determinant. This gives a reduced matrix which is much more feasible for practical computation; however, this introduces another problem - size extensivity..... (64) 1. Size extensivity of CI A method is said to be size extensive if the energy calculated scales linearly with the number of particles N, i.e. the word "extensive" is used in the same sense as in 61

62 thermodynamics. The truncated CI will introduce errors in the wave function, which will in turn cause errors in the energy and all other properties. A particular result of truncating the N-electron basis is that the CI energies obtained are no longer size extensive. Let us show that CI is not size extensive through an example. Consider two noninteracting hydrogen (H 2 ) molecules. We expect the total energy of the two molecules to be the sum of the individual molecules, i.e. E(2H 2 ) = 2E(H 2 ). Using CID for a single H 2 molecule will result in the exact energy; however, if we use the CID method and consider the energy from CI wavefunction for the pair of molecules, the energy of the two molecules at large separation will not be the same as the sum of their energies when calculated separately. The CI wavefunction for this system will look like Ψ = A ψ a ψ b Ψ = A ( a 0 σ 2 a + a 2 σ 2 a )( a0 σ 2 b + a 2 σ 2 b ) Ψ = A ( a 2 0 σ 2 a σ 2 b + a 0 a 2 σ 2 a σ 2 b + a 0 a 2 σ 2 a σ 2 b + a 2 2 σ 2 a σ 2 b ) where A is the asymmetrization operator and states σ 2 correspond to both electrons being in the ground state and σ 2 corresponds to both electrons in the excited state. Notice that the last term in the expansion has two excitations from both molecules. This is considered a quadruply excited state, which is truncated out in the CID calculation. In order to account for the missing energy from the expected total energy, we would have to have included quadruply excited states in the CI basis set, since local double excitations could happen simultaneously on both subsystems. It is clear that the fraction of the correlation energy recovered by a truncated CI will diminish as the size of the system increases, making it a progressively less accurate method. However, if we were to truncate CI, we should realize that the spinorbitals in the Slater determinants come from HF method, so we should allow those orbitals to re-optimize as we take linear combinations of the determinants. We should also consider for example, not exciting the inner shell orbitals since the computational complexity for those excitations are huge for small effects on the energy differences. We can neglect these orbitals by "freezing" the core orbitals and implementing CI in higher orbitals. 2. MCSCF, CASSCF, RASSCF, and MRCI The Multi-Configurational Self-Consistent Field (MCSCF) method is another approach to the CI method, in which we decide on a set of determinants that can sufficiently describe our system. Each of the determinants are constructed from spinorbitals that are not fixed, but optimized as to lower the total energy as much as possible. The main idea here is to use the variational principle to not only optimize the coefficients in front of the 62

63 determinants, but also the spinorbitals used to construct the determinants. In a sense, the MCSCF method is a combination of the CI method and HF method (if the number of determinants chosen was just 1, we get back the HF method). The classical MCSCF approach follows very closely to the Ritz variational method described before. We start with the MCSCF wavefunction, which has the form of a finite linear combination of Slater determinants Φ I Ψ MCSCF = c I Φ I where c I are the variational coefficients. Next, we calculate the coefficients for the determinant using the variational method, without changing the determinants. Next, we vary the coefficients in the determinants at the fixed CI coefficients to obtain the best determinants. And finally, we repeat by going back and expanding the MCSCF wavefunction in terms of the newly optimized determinants. The MCSCF method is mainly used to generate a qualitatively correct wavefunction, i.e. recover the "static" part of the configuration. The goal is usually not to recover a large fraction of the total correlation energy, but to recover all the changes that occur in the correlation energy for a given process. A major problem that this procedure faces is figuring out which configurations are necessary in include for the property of interest. The Complete Active Space Self-Consistent Field (CASSCF) method is a special case of the MCSCF method. From the molecular orbitals computed from HF, we partition the space of these orbitals into an active and inactive space. The inactive space of spinorbitals are chosen from the low energy orbitals, i.e. the doubly occupied orbitals in all determinants (inner shells). The remaining spinorbitals belong to the active space. Within the active space, we consider all possible occupancies and excitations of the active spinorbitals to obtain the set of determinants in the expansion of the MCSCF wavefunction (hence, "complete"). A common notation used for CASSCF is the following: [n,m]-casscf, where n is the number of electrons distributed in all possible ways in m spinorbitals. For example, [11,8]-CASSCF for the molecule NO pertains to the problem of 11 valence electrons being distributed between all configurations that can be constructed from 8 molecular orbitals. For any full CI expansion, CASSCF becomes too large to be useful, even with small active spaces. To overcome this problem, a variation called the Restricted Active Space Self- Consistent Field (RASSCF) method is used. In the RASSCF method, the active orbitals are divided into 3 subsections, RAS1, RAS2, and RAS3. Each of these subsystems have restrictions on the excitations allowed. A typical example is one where RAS1 includes occupied orbitals that are excited in the HF reference determinant, RAS2 includes orbitals from the full CI or limited to SDTQ excitations, and RAS3 includes virtual orbitals that are empty in the HF determinant. The full CI expansion within the active space severely restricts the number of orbitals 63 I

64 and electrons that can be treated by CASSCF methods. Any additional configurations to those from RAS2 space can be generated by allowing excitations from one space to another. For example, allowing 2 electrons to be excited from RAS1 to RAS3. In essence, a typical example of the RASSCF method generates configurations by a combination of a full CI in a small number of orbitals in RAS2 and a CISD in a somewhat large orbital space in RAS1 or RAS3. Excitation energies of truncated CI methods such as the ones described above are generally too high, since the excited states are not that well correlated as the ground state is. For equally correlated ground and excited states, one can use a method called Multi- Reference Configuration Interaction (MRCI), which can use more than one reference determinant from which certainly known singly, doubly, and higher excited states (this set of certainly known determinants is called the model space). MRCI gives a better correlation of the ground state, which is important if the system under consideration has more than one dominant determinant since some higher excited determinants are also taken into the CI space. The CI expansion is then obtained by replacing the spinorbitals in the model space by other virtual orbitals. B. Basis sets and basis set convergence The standard wave functions used in solving Schrödinger s equations for atoms and molecules are constructed from antisymmetric products of spinorbitals. In most methods, these spinorbitals are generated by expanding a finite set of simple basis functions. The choice of basis functions for a molecular calculation if therefore important, depending on which system we wish to analyze. There are hundreds of basis sets that can be used, each optimized for a specific system. The most general types include Slatertype orbitals (STO) and Gaussian-type orbitals (GTO). Here, we will consider Thom Dunning s correlation-consistent basis sets, which were designed for converging post-hf calculations systematically to the complete basis set limit using extrapolation techniques. Correlation consistent basis sets are built by adding functions corresponding to electron shells to a core set of HF functions. What we will need for carrying out accurate correlated calculations are not only a set of spinorbitals that resemble as closely as possible the occupied orbitals of the atomic systems, but also a set of virtual correlating orbits into which the correlated electrons can be excited. An obvious candidate here are the canonical orbitals from the HF calculations; however, since the lowest virtual HF orbitals are very diffuse, they will not be well suited for correlating the ground-state electrons, except when the full set of orbitals is used. Another strategy is to try and generate correlating atomic orbitals for molecular calculations by relying on the energy criterion alone, i.e. adjust the exponents of the correlating orbitals so as to maximize their contribution to the correlation energy. 64

65 By doing this, we should be able to generate sets of correlating orbitals that are more compact, i.e. contains fewer primitive basis functions. This method will generate for us correlation-consistent basis sets, meaning that each basis set contains all correlating orbitals that lower the energy by comparable amounts as well as all orbitals that lower the energy by larger amounts. In these correlation-consistent basis sets, each correlating orbital is represented as a single primitive chosen as to maximize its contribution to the correlation energy, and where all correlating orbitals that make similar contributions to the correlation energy are added simultaneously. A hierarchy of basis sets can then be set up that is correlationconsistent in the sense that each basis set contains all correlating orbitals that lower the energy by comparable amounts as well as all orbitals that lower the energy by larger amounts. The main advantage of this method is that it allows us to empty smaller primitive sets. Correlation-consistent basis sets were designed to converge systematically to the complete basis set limit using extrapolation techniques. Let us consider the structure of the correlation-consistent basis sets in more detail. We will start with the main two families of basis sets, cc-pvxz and cc-pcvxz, where n = D,T,Q,5,6,7... Here, cc-p stands for correlation consistent polarized, and V and CV stand for valence and corevalence, respectively. p indicates the presence of polarization functions in the basis set. XZ is the zeta factor, which tells us how many basis functions are used for each atomic orbital. As we increase X, we add more higher angular functions, which spans higher angular space. The basis functions are added in shells, e.g, for the C atom, cc-pvdz would consist of [3s2p1d], cc-pvtz would consist of [4s3p2d1f], and cc-pvqz would consist of[5s4p3d2f1g]. The main difference between the two families is that the ccpcvxz basis sets are extended from the standard cc-pvxz sets for additional flexibility in the core region. A prefix aug can be added to the two families of basis sets above to means that one set of diffuse functions is added for every angular momentum present in the basis, improving flexibility in the outer valence region. As the number of basis functions increase, the wavefunctions become better represented and the energy decreases to approach the complete basis set limit (CBS). An infinite number of basis functions is impossible to employ practically, but we can try to estimate the energy at the CBS limit. By using hierarchical basis sets, i.e. correlating consistent sets with adjacent angular momenta, we can calculate the energy for a couple of points then hope to extrapolate higher basis function energies or higher correlation energies. If we look at the dependence of the HF energy on the basis set size, we will see that the error in HF energy should scale exponentially with the cardinal number, X. The correlation energy scales differently, by E X 3. This allows us to carry out calculations at for example, Dζ and Tζ, and fit the energies on a logarithmic plot with energies vs. X. This line can then be used to extrapolate what the energies would be in higher ζ, or even at the CBS limit. 65

66 C. Explicitly-correlated methods In this section, we will consider methods that utilize wavefunctions that depend explicitly on the interelectronic distance r 12. This explicitly-correlated wavefunction leads to much faster convergence of the CI expansion, as well as improving dramatically the accuracy of the energy. Recall that in the HF method, we neglected all interactions between the electrons, i.e. the HF wavefunction did not depend on r 12 near r 12 = 0. This method overestimates the possibility of finding two electrons close together and thus overestimates the electron repulsion energy. To account for the interactions between electrons, we must somehow integrate the interelectronic distance into our calculation. However, these explictly correlated methods do bring a couple of problems. First, the resulting algorithms are much more difficult to implement. Second, they are incompatible with concepts such as orbitals and electron configurations since they avoid the 1-electron approximation from the very beginning. 1. Coulomb cusp We will consider the behavior of the exact wavefunctions for coinciding particles; in particular, where the electronic Hamiltonian becomes singular and gives rise to a cusp in the wavefunction. For simplicity, we will examine the ground state of He, which we can easily generate accurate approximations to the true wavefunction. The Hamiltonian of He is given as H = r 1 2 r r 1 r 2 We can see here that the singularities of this Hamiltonian occur if r 1 = 0, r 2 = 0, or r 1 r 2 = 0. At these points, the exact solution of Schrödinger s equation must provide contributions to HΨ that balance the singularities in H to ensure the local energy remains constant and equal to the energy eigenvalue E. The only possibly source of this balancing is via the kinetic energy term. It is convenient to express the Hamiltonian in terms of relative coordinates r 1, r 2, and r 12, where r 1 and r 2 are the distances of the electrons to the nucleus and r 12 the interelectronic distance. Doing so, we get H = 1 2 ( Z ) ( ) ( r1 r12 + r ) 2 r21 2 r i r i r i r 12 r 12 r 12 r 1 r 12 r 1 r 2 r 21 r 2 r 12 i=1 r 2 i r 2 12 Schrödinger s equation must be well behaved, so the singularities must somehow cancel, leading to a nuclear and interelectronic cusps. In order for the singularities to cancel, terms that multiply r 1 i and r 1 12 must cancel. We ll only look at the electron-electron cusp, for which terms with r 1 12 must vanish in HΨ. From the second term, we find that this leads to Ψ r 12 r12 =0 = 1 2 Ψ (r 12 = 0) 66

67 which describes the behavior of the wavefunction when the electrons coincide and represents the electron-electron cusp condition. This cusp condition is impossible to fulfill using orbital-based wave functions. If we do a FCI expansion for He in terms of Slater-type orbitals, we will get Ψ FCI = e ζ(r 1+r 2 ) ijk c ijk (r i 1 rj 2 + rj 1 ri 2 )r2k 12 where the summation is over all nonnegative integers. This FCI expansion thus contains all possible combinations of powers of r 1, r 2 and r 12. Our wavefunction now includes the interelectronic distance r 12 ; however, since only even powers of r 12 are present, the cusp condition can never be satisfied. This missing cusp condition in the wavefunction leads to slow convergence of CI with respect to the basis set. This is an intrinsic problem shared by all wavefunction expansions in orbital products. In order to fix this problem and gain faster convergence, we will introduce an explicit linear dependence on r 12 into the wavefunction. = ( r 12)Ψ CI Ψ CI r 12 Now if we take the derivative, we get Ψ CI r 12 r 12 r12 =0 = 1 2 Ψ CI (r 12 = 0) = 1 2 Ψ CI r 12 (r 12 = 0) which satisfies the Coulomb cusp condition exactly. In general, we may impose the correct Coulomb cusp behavior on any determinant-based wave function Φ by multiplying the expansion by some correlating function γ, such that γ = which leads to the correct non-differentiable cusp in the product function γφ. A proof of this is assigned as a homework problem. However, just because it now has the correct cusp behavior doesn t mean that the associated improvements in the energy are significant. If we plot the Helium ground state energy as a function of the number of terms in the expansion, we will see that introducing the single r 12 term reduces the error by 2 orders of magnitude. In order to converge the CI-R12 energy with even more accuracy, we will need an even more flexible wave function. i<j r ij 2. Hylleraas function The Hylleraas function is one such function. Hylleraas was the first who succeeded in constructing an accurate wavefunction for the singlet S state helium atom. If we 67

68 generalize the FCI expansion of STO s to include all powers of r 12, we will obtain Ψ H = e ζ(r 1+r 2 ) ijk c ijk (r i 1 rj 2 + ri 2 rj 1 )rk 12 which is usually expressed as Ψ H = e ζs ijk c ijk s i t 2j u k where s, t and u are the so-called Hylleraas coordinates, s = r 1 + r 2, t = r 1 r 2, u = r 12 The Hylleraas function is usually truncated according to i + 2j + k N; however, it still presents very high accuracy with only a few terms, especially with Helium. This function is also only applicable to few electron atomic systems, since the complexity of the function increases dramatically with more electrons. 3. Slater geminals Geminals, or two-electron functions, are another type of explicitly correlated functions that represent a generalization of single-electron orbitals accounting for intra-orbital correlation effects. The wavefunction is expanded into two-electron basis functions in addition to orbital products. The primary cusp condition suggests that such an expansion is effective for geminal basis functions with the asymptotic behavior f 12 = 1 2 r 12 + O(r 2 12 ) Including these f 12 functions requires two-electron integrals for operators f 12 and r 1 12, such as K (Q) 12 = ( 1f 12 ) ( 1 f 12 ) Most explicitly correlated methods have employed basis functions such as the linear r 12 (R12) or Gaussian-type geminals (GTG) f GT G 12 = f R12 12 = 1 2 r 12 N G G c G e ζ Gr 2 12 A downside of R12 functions is that the associated energies do not always cover a sufficient fraction of the correlation energy. GTG does not suffer from such a problem at large r 12 ; however it never fulfills the cusp condition exactly. Despite this, a modest 68

69 number of GTGs can still represent a suitable range of r 12 accurately. The main disadvantage is that the computation of the integrals involved can get relatively costly especially for operators quadratic to f 12 involving NG 2 /2 primitive operations. Slater-type geminals (STG), or Slater geminals, with the form f ST G 12 = r c 2 e r 12/r c where r c is a scale-length parameter, remedy the above problems of GTGs. These functions use STO s as geminal basis functions to incorporate interelectronic distances. STG simplifies the quadratic operators to the exponential forms, i.e. K (Q) 12 = 1 4 e 2r 12/r c ( ) f ST G 2 12 = r 2 c K (Q) 12 It turns out that STG provides better results in comparison to methods such as GTG and R12. For example, the upside of these functions is that at least 5ζ quality results are obtained in a Tζ basis when used. From a computational point of view, STGs are also more efficient due to its compact and short-range form. 4. Explicitly-correlated Gaussian functions Explicitly correlated Gaussians were proposed to describe N-particle wavefunctions with a basis of exponential functions with an argument involving the square of the interelectronic distances, ψ ECG = Ae N i<j α ij (r i r j ) 2 where α ij are adjustable parameters. These functions were called Gaussian-type geminals (GTG) in its earliest, two-electron version. At first, these functions with exponential correlating factors were underestimated and claimed to have much slower convergence times than correlating functions that had powers of r 12. It was shown that careful optimization of the nonlinear parameters allows very short expansions of high quality for certain molecules, such as H 2. The main advantage of this method is that they have very simple integrals, which results in easy applicability to general many-center molecules. The integrals are no more complicated than ordinary Gaussian integrals involving only the exponential function and the well-known gamma function. Other advantages of these types of functions are that they give very high accuracy since the basis functions are correlated, which is magnified for systems with strongly attractive interparticle interactions. The quadratic form involving r ij also permits the reduction of the Hamiltonian matrix elements to very simple analytic expressions, which do not gain anymore algebraic complexity for 69

70 N 3. The main disadvantages they seem to exhibit are that they are unable to describe the electron-nuclear cusp, they vanish too quickly for large distances, and the Gaussian correlation factor does not reproduce the electron-electron cusp, as mentioned in the previous section. VIII. MANY-BODY PERTURBATION THEORY (MBPT) The second quantized formalism is perhaps most extensively utilized in the field of perturbation theory of many-electron systems. The is due to the tedious derivations necessary to arrive at feasible working formulae, especially at the higher orders of PT... A. Rayleigh Schrödinger perturbation theory (classical derivation) Let us review first the essence of the nondegenerate Rayleigh-Schrödinger perturbation theory. Consider the time-independent Schrodinger equation. ĤΨ n = EΨ n (65) Finding solutions to this equation is, in most cases, a difficult task. Assume, however, that the Hamiltonian consist of two Hermitian parts, zero-order part and a perturbation, It is convenient to write the following form Ĥ = Ĥ 0 + ˆV (66) Ĥ = Ĥ 0 + λ ˆV (67) where λ is an "order parameter" that is used to classify the various contributions by their order. We assume, solutions to the zeroth order eigenvalue problem for Ĥ 0 with Ĥ 0 Φ n = E (0) n Φ n (68) Φ m Φ n = δ mn (69) If Φ n is nondegenerate, it is possible to number the solutions in such a way that lim Ψ n = λ 0 Φ n lim E n = E n (0) λ 0 And if there are degeneracies, it is possible to choose the zero-order solutions so that (70) is still satisfied. χ n = Ψ n Φ n E n = E n E (0) n 70 (70) (71)

71 Here we have partitioned Ψ n into two parts, one parallel (i.e. proportional) to Φ n and the other orthogonal to it. So it is convenient to use intermediate normalization: Φ n Φ n, χ n Φ n, Ψ n Φ n = Φ n + χ n Φ n = 1, Ψ n Ψ n = 1 + χ n χ n (72) To proceed further, we use the order parameter λ and expand: Ψ n = Φ n + χ n = Ψ (0) n + λψ (1) n + λ 2 Ψ (2) n +... (Ψ (0) n Φ n ) E n = E (0) n + E n = E (0) n + λe (1) n + λ 2 E (2) n +... (73) Substituting into the Schrödinger equation (Ĥ En ) Ψn = 0 (74) with Ĥ = Ĥ 0 + λ ˆV, we get )( ) (Ĥ0 + λ ˆV E n (0) λe n (1) λ 2 E n (2)... Ψ n (0) + λψ n (1) + λ 2 Ψ n (2) +... = 0 (75) Equating coefficients of powers of λ gives for λ 0, λ 1, and λ 2, respectively: ) (Ĥ0 E n (0) ) (Ĥ0 E (0) n (Ĥ0 E (0) n Ψ (0) n = 0 (zero order), (76) ( ) Ψ n (1) = E n (1) ˆV Ψ n (0) (first order), (77) ) ( ) Ψ n (2) = E n (1) ˆV Ψ n (1) + E n (2) Ψ n (0) (second order) (78) and in general, for λ m, the mth-order equation which becomes ) ( (Ĥ0 E n (0) Ψ n (m) = ( ) E n (0) Ĥ 0 Ψ n (m) E (1) n ) ˆV Ψ (m 1) = ˆV Ψ (m 1) n m 2 n + m 1 l=0 l=0 E n (m l) Ψ n (l) (79) E n (m l) Ψ n (l) (80) In order to get expressions for E (m) n we apply Φ n to each equation and integrate. For λ 1 we get Φ n Ĥ 0 E (0) n Ψ (1) n = Φ n E (1) n ˆV Φ n (81) 71

72 By the Hermitian property of Ĥ 0 we have ) (Ĥ0 E n (0) Φ n Ψ n (1) = E n (1) Φ n ˆV Φ n (82) } {{ }} {{ } V nn =0 and so E (1) n = Φ n ˆV Φ n = V nn (83) Thus we have obtained E n (1) without knowledge of Ψ n (1) and same can be done for each order m: giving Φ n E n (0) Ĥ 0 } {{ } =0 Ψ (m) n = Φ n ˆV Ψ (m 1) E (m) n Thus, in principle, we can obtain each E n (m) etc., while always maintaining Φ n Ψ (m) Ψ (m) n n m 1 l=0 E (m l) n Φ n Ψ (l) n } {{ } =δ l0 (84) = Φ n ˆV Ψ (m 1) n (85) from the previous Ψ n (m 1) n = 0(m > 0). and then solve for To calculate Ψ n (m) we can expand it in terms of the known zero-order solutions Φ k. This exploits the fact that the set of eigenfunctions of any semibounded Hermitian operator form a complete set: To obtain a (m) kn Ψ (m) n = k a (m) kn Φ k = k Φ k Φ k Ψ (m) n a (m) kn = Φ k Ψ (m) n (to be determined) we multiply the mth-order equation by Φ k and integrate: (86) Φ k E n (0) Ĥ 0 } {{ } ( E (0) n E (0) k ) Φ k Ψ (m) n = Φ k ˆV Ψ n (m 1) } {{ } j Φ k ˆV Φ j Φ j Ψ (m 1) n m 1 l=0 E (m l) n Φ k Ψ n (l) } {{ } =a (l) kn (87) Thus ( ) E n (0) E (0) k a (m) kn = j V kj a (m 1) jn m 1 l=0 E n (m l) a (l) kn (88) In this equation the l =0 contributions are to be interpreted as a (0) kn = Φ k Φ n = δ kn. This result provides a system of equations for the a (m) kn, coefficients, to be solved order by 72

73 order, but the first thing to notice is that we have no equation for a (m) nn ; this coefficient is arbitrary, corresponding to the arbitrariness of adding any multiple of the zero-order solution Φ n. This arbitrariness appears for each order Ψ (m) n separately. The following choice of intermediate normalization can thus be made for each order: Consequently Φ n Ψ (m) n = 0 (m > 0), a (m) nn = 0 (m > 0). Since a (0) kn = δ kn, the first-order equation becomes: ( ) E n (0) E (0) k a (1) kn = V kj a (0) jn E n (1) a (0) kn }{{} a (m) nn = δ m0 (89) j δ jn = V kn E (1) n a (0) kn = V kn (n k) Thus we have the well-known result a (1) kn = V kn E n (0) E (0) k Ψ (1) n = k n k (n k) (90) V kn E n (0) E (0) Φ k (91) k From this we get the second-order energy, E n (2) = Φ n ˆV Ψ n (1) = Φ n ˆV Φ k Φ k Ψ n (1) = a (1) kn V nk = n k V nk V kn E n (0) E (0) k = n k k V kn 2 E (0) k E n (0) which is also well known. This process can be continued in the same manner to higher orders, e.g., ( ) 1 a (2) kn = E n (0) E (0) k a (1) jn V kj E n (1) a (1) kn E(2) n a (0) kn = k,j n ( E (0) k E n (0) n j V kj V jn )( E (0) j E n (0) ) k n V kn V nn ( E (0) k E n (0) ) 2 (k n) (92) (93) 73

74 Using the (93) we can write E (3) n, E (3) n = k,j n ( E (0) k E n (0) V nk V kj V jn )( E (0) j E n (0) ) k n V nk V kn ( E (0) k E n (0) ) 2 (94) It is evident that while this procedure is quite straightforward, the book-keeping for the generation of order by order wave function and energy is cumbersome. B. Hylleraas variation principle Hylleraas showed that the first order wave function and the second order energy can also be determined variationally. According to Hylleraas variation principle, if the trial (1) wave function Ψ n is an approximate solution of the first order wave function, then using (1) (77) and, multiplying it with Ψ n and integrating, ) ( ) (1) Ψ n (Ĥ0 E n (0) (1) (1) Ψ n = Ψ n ˆV Φ n 0 = Ψ (1) n ) (Ĥ0 E n (0) (1) (1) Ψ n + Ψ n E (1) n To this equation we add the equation for the second-order energy, Adding (95) and (96) Ẽ (2) n = Φ n ˆV E (1) = 2Re Ψ (1) n (1) n Ψ ( ) ˆV E n (1) Φ n (95) Ẽ n (2) = Φ n ˆV E n (1) (1) Ψ n (96) (1) n + Ψ n ) Φ n + ( ˆV E (1) n If we define a functional [ ] (1) J 2 Ψ n = 2Re Ψ (1) n ) (Ĥ0 E n (0) Ψ (1) n Ψ (1) (Ĥ0 E (0) n n + ) ( ) ˆV E n (1) (1) Φ n + Ψ n Ψ (1) n ( ) ˆV E n (1) Φ n Ψ (1) n (97) ) (Ĥ0 E n (0) (1) Ψ n (98) Then we can write, [ ] (1) J 2 Ψ n E n (2) (99) If Ψ (2) n is the exact correction to the wave function, from (99) it follows: [ ] (1) J 2 Ψ n = E n (2) (100) 74

75 [ ] (1) Otherwise, the functional J 2 Ψ n yields an upper bound for E n (2). Then it can be proved that [ (99) ] for the first-order correction follows directly from the variation of functional (1) J 2 Ψ n equated to zero. [ ] ( ) (1) (1) δj 2 Ψ n = δ Ψ n ˆV E n (1) ) + δ Ψ (1) n (Ĥ0 E (0) n Φ n + Φ n (1) (1) Ψ n + Ψ n ( ) ˆV E n (1) δ (Ĥ0 E (0) n [ ] (1) Requiring that δj 2 Ψ n = 0 for any δ Ψ (including δ Ψ ), then for which equation). = δ {( ) (1) Ψ n ˆV E n (1) (Ĥ0 E (0) n ) Ψ (1) n = ( E (1) n Φ n + δ ˆV Ψ (1) n (101) ) (1) δ Ψ n ) } (1) Ψ n (Ĥ0 E n (0) (1) Ψ n then, ) Ψ n (0) (102) (1) Ψ n = Ψ n (1) is a solution (since the above relation is equivalent to the first-order Next we show that if E (0) n is the lowest eigen value of Ĥ 0 then δj 2 [ Ψ (1) n bound for E (2) n. Taking the trial wave function Using (103) in (98) gives, [ ] (1) J 2 Ψ n = 2Re Ψ n (1) + χ ( ) ˆV E n (1) ) ( ˆV E (1) n Ψ (1) n as, ] is an upper Ψ (1) n = Ψ (1) n + χ (103) Φ n + Ψ n (1) + χ = 2Re Ψ n (1) Φ n + 2Re χ ) + 2Re +χ (Ĥ0 E n (0) Ψ n (1) + χ } {{ } ( E (1) n ˆV ) Φ n ( ˆV E n (1) ) (Ĥ0 E (0) n ) (Ĥ0 E n (0) ) Ψ (1) Φ n + Ψ (1) n n + χ (Ĥ0 E (0) n ) Ψ (1) n χ (104) Second and fourth term in the above equation cancels each other, [ ] [ ] ) (1) J 2 Ψ n = J 2 Ψ n (1) + χ (Ĥ0 E n (0) χ (105) If E (0) n is the lowest eigen value of Ĥ 0 then the integral χ ) (Ĥ0 E n (0) χ is nonnegative and zero if and only if, χ is the corresponding eigenfunction. Therefore, [ ] (1) J 2 Ψ n E n (2) (106) 75

76 [ ] (1) Thus J 2 Ψ n, with an arbitrary trial function Ψ (1) n containing adjustable parameters, can be used in a variational approach for finding approximations to the first-order wave function and second-order energy, and this provides an upper bound to E n (2) in the case of a state having the lowest zero-order energy (provided that Φ n is an exact eigenfunction of Ĥ 0. C. Møller-Plesset perturbation theory The role of the many-body theory is to evaluate the expressions of energy for different orders coming from RSPT, containing many electron wave functions in terms of orbital contributions. The matrix elements should be expressed in terms integrals over oneelectron functions. In the course of quantum mechanical application, the following points should be clarified: 1. The nonrelativistic Born-Oppenheimer many-body Hamiltonian projected to a given basis set can be most conveniently specified by the usual second quantized form. Underlying basis set is assumed to be orthonormalized; MBPT calculations are usually performed in the molecular orbital (MO) basis which meets this criterion. 2. The choice of the zeroth-order Hamiltonian is arbitrary, any Hermitain operator would do in principle. In wishes to chose Ĥ 0 as close to Ĥ as possible in order to obtain favorable convergence properties of the perturbation series. On the other hand, Ĥ 0 should be as simple as possible, since one should be able to diagonalize it and obtain its complete set of eigenfunctions. A practical balance between these two conflicting requirements is to choose Ĥ 0 as the Fock operator: Ĥ 0 = ˆF = ε p ˆp ˆp (107) in terms of molecular spinorbital operators ˆp and orbital energies ε p. By this choice, the perturbation operator ˆV describes the electron correlation (the error of the Hartree-Fock approach) and the aim of the perturbation calculation is to improve the HF energy towards the exact solution of the Schrödinger equation in the same basis set. This is the so-called Møller-Plesset partitioning. The formal expansion of the MPPT partitioned Hamiltonian may be written as 1 [Ĵ(i) Ĥ = Ĥ 0 + ˆV ˆV = Ĥ ˆF = ˆK(i) ] p i<j r ij } {{ } Ĥ 2 i } {{ } Û (108) 76

77 where and, Ĥ 2 = 1 pq rs {p q sr} + pi qi {p q} + 1 ij ij (109) 4 2 pqrs pqi Û = pi qi {p q} + 0 Û 0 pqi = pi qi {p q} + ij ij (110) pqi ij ij Then we can write ˆV = Ĥ 2 Û = 1 pq rs {p q sr} 1 ij ij (111) 4 2 pqrs In the above derivation no assumption has been made about ˆF. We can now assume if in canonical HF (107) is valid. 3. Accepting the partition described by (107), the solution of the zeroth-order equation involves the solution of the Hartree-Fock problem. We have to specify the ground states and excited many-electron states explicitly. The ground state is simply the Fermi vacuum, Ψ (0) 0 = Fermivacuum = HF = 0 (112) The excited states can be classified according to the number of electrons to be excited. Singly excited states are given given by: ij Ψ (0) K = â î 0 (113) where K labels the i a excitation. Equation (113) expresses that an electron is annihilated from spinorbital i and it is inserted into a. A doubly excited state is given by Ψ (0) K = â ˆb ĵî 0 (114) where K = { i a j b 77

78 Now let us evaluate RSPT theory formulae using second quantization. Starting from zeroth order, we have Ĥ 0 Ψ (0) 0 = ˆF 0 = ε {î i î } 0 + ε i 0 i i occ = ε i 0 (115) i Here only hole-hole pair has contributed. So the Fermi vacuum is the zeroth-order eigen function in the ground-state of the Ĥ 0. And i ε i is the sum of the energies of the occupied orbitals and not the HF energy. The first order contribution is given by, E (1) 0 = 0 ˆV 0 (116) It follows that the energy to the first order will be E = E (0) 0 + E(1) 0 = 0 Ĥ ˆV 0 = 0 Ĥ 0 + ˆV 0 = 0 Ĥ 0 = 0 Ĥ 1 + Ĥ 2 0 (117) Using the second quantized form of the Ĥ as mentioned previously, we can expression one-electron and two-electron as, Ĥ 1 = Ĥ 1,N + i ĥ i, 1 Ĥ 2 = Ĥ 2,N + ˆ H 2,N ij ˆv ij (118) ij Equating (118) in (117) we get, E = i ĥ i + 1 ij ˆv ij (119) 2 i ij E = E ref = E HF (120) which is the expectation value of the full Hamiltonian with the Hartree-Fock wave function, the Hartree-Fock electronic. We see that, using the Møller-Plesset partitioning, the first order of perturbation theory corrects the sum of orbital energies to the true HF energy. 78

79 Then first-order energy can be written as, E (1) 0 = 0 Ĥ 0 0 Ĥ 0 0 = ε i + h ii + 1 ij ij 2 i i ij = u ii + 1 ij ij 2 i = ij ij + 1 ij ij 2 ij ij ij = 1 ij ij (121) 2 ij where we have used ˆF = Ĥ + Û and f pq = h pq + u pq. In deriving the second order result, the explicit form of the perturbation operation ˆV should be specified. We can write, ˆV = Ĥ Ĥ 0 (122) To evaluate the second order formula, the only matrix element we need is V 0K since the second-order energy correction can also be written as: E (2) 0 = n K V 0K 2 E (0) K E(0) 0 (123) where K labels an excited state. In principle, it can be a p-fold state with p = 1,2,3... However, it is easy to show only p = 2 contribute to V 0K. Let us check first the role of singly excited states. From the Brillouin theorem we know that the full Hamiltonian does not have such a matrix element: that is, H 0K = Ψ (0) (0) 0 Ĥ Ψ K = 0 (124) Ψ (0) 0 Ĥ 0 + ˆV Ψ (0) K = E K Ψ (0) (0) (0) 0 Ψ K + Ψ 0 ˆV Ψ (0) K = V 0K = 0 (125) where the zeroth-order Schrödinger equation and the orthogonality of the zeroth-order states are utilized. It follows that V 0K = 0 if K is a singly excited state. And for any excited state higher than doubly excited state it will also give zero contribution to V 0K = 0 because ˆV contains at most two-elctron terms, then using Slater- Condon rule for two-electron operator with more than two non-coincidences we get zero. 79

80 So, only doubly excited states contribute to the matrix element V 0K, thus only they enter the second-order formula (123). With this result, the matrix element of V 0K = 0 can be evaluated. Then Ψ (0) 0 Ĥ Ĥ 0 Ψ (0) K (126) First, we evaluate one-electron part of ˆV using generalized Wick s theorem and (118), Ψ (0) 0 Ĥ 1 Ψ (0) = h pq Ψ (0) 0 { ˆp ˆq} Ψ (0) = h pq Ψ (0) 0 { ˆp (0) ˆq}{â ˆb ĵî} Ψ 0 (127) K pq K here p and q include both hole and particle states. Second term in ˆF will not contribute because Ψ (0) 0 and Ψ (0) K are orthogonal. = h pq Ψ (0) 0 {p qa b ji} + {p qa b ji} + {p qa b ji} + {p qa b ji} +... pq pq + all allowed contractions Ψ (0) 0 (128) Here all terms we will get are not fully contracted, so vacuum expectation value of such a operator vanishes. Same argument holds for Ĥ 0. So only non-zero contribution will come from two-electron operator of Ĥ. Now, Ψ (0) 0 Ĥ 2 Ψ (0) (0) K = Ψ 0 Ĥ 2,N + H ˆ 2,N + 1 ij ˆv ij Ψ (0) 2 K (129) Again here also second term will become zero because H ˆ 2,N has similar form as Ĥ 0 and third term vanishes because of orthogonality. Then, Ψ (0) 0 Ĥ 2 Ψ (0) (0) K = Ψ 0 Ĥ 2,N Ψ (0) K = 1 pq ˆv rs Ψ (0) 2 0 {p q sr}{a b ji} Ψ (0) 0 (130) pqrs Using generalized Wick s theorem and collecting fully contracted terms with non-zero contractions, = 1 pq ˆv rs Ψ (0) 2 0 {p q sra b ji} + {p q sra b ji} + {p q sra b ji} pqrs ij + {p q sra b ji} + {p q sra b ji} Ψ (0) 0 (131) First term has no contribution, then = 1 pq ˆv rs [ ] δ 2 pi δ qj δ sa δ rb + δ pi δ qj δ sb δ ra δ pj δ qi δ sb δ ra + δ pj δ qi δ sa δ rb pqrs = 1 [ ij ˆv ba + ij ˆv ab ji ˆv ab + ji ˆv ba ] 2 = [ ij ˆv ab ij ˆv ba ] = [ ij ab ] (132) 80

81 Then collecting all non-zero terms and substituting into (123). The excitation energy in the denominator of the second-order formula is determined by the change in the sum of the orbital energies due to the change in the occupancy of the orbitals upon excitation: E (2) 0 = a<b,i<j ij ab 2 ε a + ε b ε i ε j (133) Equation (133) is second-order Møller-Plesset(MP2) formula for the correction energy in terms of the spinorbitals. Similarly for third-order formula (94), we have already calculated V 00 = E (1) 0 and V 0K. Only unknown matrix element is V KJ = Ψ (0) K ˆV Ψ (0) J. Here also only doubly excited states with two electron operator term will contribute as mentioned before. Using generalized Wick s theorem we can write, Ψ cd lm p p Ψ ab ij = 0, cd Ψlm p q Ψij ab = 0 (134) Then, V KJ = Ψ (0) K ˆV Ψ (0) J = Ψ (0) 0 {l m dc}{p q sr}{a b ji} Ψ (0) 0 = Ψ (0) 0 {l m dcp q sra b ji} + {l m dcp q sra b ji} + {l m dcp q sra b ji} + {l m dcp q sra b ji} + {l m dcp q sra b ji} many terms Ψ (0) 0 (135) Remaining work is left as an exercise. If we evaluate all fully contracted terms then we will end up with, E (3) 0 = occ i,j,l,m occ i,j vir ab vir a,b,c,d occ vir i,j,l a,b,c ij ab ab lm lm ij ( εa + ε b ε i ε j ) (εa + ε b ε l ε m ) ij ab ab cd cd ij ( εa + ε b ε i ε j )( εc + ε d ε i ε j ) ij ab lb cj ac il ( εa + ε b ε i ε j ) (εa + ε c ε i ε l ) (136) This is MP3 formula for the correction energy in terms of the spinorbitals. 81

82 D. Diagrammatic expansions for MPPT 1. Diagrammatic notation Sometimes evaluating terms in second-quantization treatment can be cumbersome and error-prone so to our calculations easy diagrammatic notation was introduced. It helps to list all non vanishing distinct terms in the perturbation sums, to elucidate certain cancellations in these sums and to provide certain systematics for the discussion and manipulation of the various surviving terms. Time ordering represents the time sequence in the application of various operators, and this is indicated in the diagrams by means of a time axis for the sequence of events. Another common arrangement is to place the time axis horizontally, from right to left. The actual time at which each event occurs (i.e. an operator acts) is irrelevant; only the t FIG. 1. Time Ordering sequence is significant. Starting with the representation of a Slater determinant (SD). The reference state (the Fermi vacuum) is represented by nothing, i.e. by a position on the time axis at which there are no lines or other symbols. Any other SD, is represented by vertical or diagonal directed lines, pointing upward for particles and downward for holes, with labels identifying the spinorbitals. The horizontal double line represents the point Ψ a i = {â î} 0 i a of operation of the normal-product operator, and below or above it we have the Fermi vacuum. To avoid phase ambiguity, we can indicate which particle index appears above which hole index. 82

83 Ψ ab ij = {â ˆb ĵî} 0 i a j b 2. One-particle operator Now we consider the representation of operators. We begin with a one-electron operator in the normal form, say, Û N = p û q { ˆp ˆq} (137) pq acting on singly excited Slater determinant Ψ a i = {â î} 0 The action and representation of the individual terms in the sum over p, q in (137) will depend on whether p and q are particle or hole indices. For illustration we will consider two cases particle particle(pp) and particle hole(ph) only and remaining cases are left as an exercise. We begin with a (pp) term, then application of one-electron operator on singly excited Slater determinant we obtain (using the generalized Wick s theorem) b û c {ˆb ĉ}{â î} 0 = b û c δ ac Ψ b i = b û a Ψ b i (138) which is represented by the diagram Here at the bottom we had Ψi a and at the top we b i a have Ψi b, the resulting determinant. The point of action of the operator is marked by the interaction line (or vertex ). We associate the integral b û a with the vertex as a multiplicative factor. Note that the bra spinorbital in the integral corresponds to the line leaving the vertex, while the ket corresponds to the entering line. 83

84 Similarly for ph, b û j {ˆb ĵ}{â î} 0 = b û j Ψ ab ij (139) showing that the resulting determinant is Ψij ab. The following principles are used to b j i a draw these kind of diagrams: 1. The interaction is denoted by a dotted, horizontal line and the electron orbitals involved in that interaction by solid, vertical lines, connected with the interaction line to a vertex. 2. A core orbital is represented by a line directed downwards (hole line) and a virtual orbital by a line directed upwards (particle line). 3. The orbitals belonging to the initial state (to the right in the matrix element) have their arrows pointing toward the interaction vertex, those of the final state away from the vertex. 3. Two-particle operators We now turn to a two-particle operator in normal-product form, Ŵ = 1 pq rs { ˆp ˆq ŝ ˆr} = 1 pq rs { ˆp ˆq ŝ ˆr} (140) 2 4 pqrs This operator is denoted by an interaction line connecting two half-vertices at the same level (i.e. the same point on the time axis). The two half-vertices and the interaction line constitute a single vertex. Each individual half-vertex will have one incoming and one outgoing line, each of which may be a particle line or a hole line. The association of line labels with the two-electron integral indices and the creation or annihilation operators follows the same rule as for one-body vertices: pqrs incoming line annihilation operator ket state outgoing line creation operator bra state (141) 84

85 electron 1 left half-vertex electron 2 right half-vertex (142) The integral indices associated with a two-body vertex are assigned according to the scheme left-out right-out left-in right-in (143) while the corresponding operator product can be described by {(left-out) (right-out) (left-in) (right-in)} (144) Diagrams employing this representation of the two-body interaction (which is based on non-antisymmetrized integrals) are called Goldstone diagrams. Consider a simple example of vacuum expectation value of Ŵ 2. Then using Wick s theorem we obtain, 0 Ŵ 2 0 = 1 pq rs 1 tu vw 0 { ˆp ˆq ŝ ˆr}{ˆt û ŵ ˆv} pqrs tuvw = 1 ij ab ab ij ij ab ba ij 4 abij abij ij ab ab ji + ij ab ba ji abij abij (145) The diagrammatic description of terms can be done easily using rules defined earlier. The first and fourth diagrams are equivalent (by exchange of the two half-vertices at the top or bottom) and so are the second and third. So keeping only first and third terms in the sum. And to identify the correct phase factor, there is a rule ( 1) h l. Where h is the number of hole lines in the loop and l is the number of loops. So, 0 Ŵ 2 0 = 1 ij ab ab ij 1 ij ab ab ji (146) 2 2 abij The factor 1/2 derives from the fact that each of these diagrams is symmetric under reflection in a vertical plane through its middle. Now if we want to do similar calculations for matrix element like 0 Ŵ 3 0, Goldstone representation will have number of distinct diagrams as the number of interaction vertices increases, reflecting the individual listing 85 abij

86 of each possible exchange. There is also some difficulty in making sure that all those distinct possibilities have been listed exactly once, since it is not always easy to determine whether two diagrams are equivalent. However, the advantage of Goldstone diagrams is the straightforward determination of phase factors. The difficulties associated with the use of the Goldstone representation can be overcome by basing the analysis on the antisymmetric integrals pq rs. Since the exchange contribution is incorporated within each antisymmetrized integral, such an approach leads to a much smaller number of distinct diagrams. The diagrams using this representation of the Ŵ operator are called Hugenholtz diagrams. 4. Hugenholtz diagrams They maintain the usual (Goldstone) form for one-body operators but represent the two-body vertex as a single large dot with two incoming and two outgoing lines (each of which can be a particle or hole line). The labels on the outgoing lines appear in the bra part of the antisymmetrized integral, while the incoming labels appear in the ket part. The order of the labels in each part is indeterminate, and therefore the phase of the corresponding algebraic interpretation is indeterminate. The Hugenholtz representation of the 0 Ŵ 2 0 matrix element has just one distinct diagram instead of two, Expansion of the antisymmetrized integrals in terms of ordinary 0 Ŵ 2 0 = = 1 4 ij ab ab ij integrals gives four terms, which are equal in pairs, reproducing the two-term result obtained with Goldstone diagrams. The weight factor 1/4 is obtained by counting the number of pairs of equivalent lines in the diagram: a pair of lines is equivalent if they connect the same pair of vertices in the same direction. Each pair of equivalent lines contributes a factor 1/2. The diagram for 0 Ŵ 2 0 has two such pairs, resulting in a weight factor 1/4. Goldstone and Hugenholtz representation is left as an exercise. It is a good exercise to convince yourself the power of Goldstone and Hugenholtz representation. 86

87 5. Antisymmetrized Goldstone diagrams The antisymmetrized Goldstone diagrams can be summarized by the following rules: 1. Generate all distinct Hugenholtz skeletons. 2. For each skeleton assign arrows in all distinct ways to generate Hugenholtz diagrams. 3. Expand each Hugenholtz diagram into an ASG diagram in any of the possible equivalent ways. 4. Interpret each two-body vertex in each ASG diagram in terms of an antisymmetrized integral, with the usual left-out right-out left-in right-in arrangement. 5. Interpret each one-body vertex in each ASG diagram as in ordinary Goldstone diagrams. 6. Assign a phase factor ( 1) h l,as for ordinary Goldstone diagrams. 7. Assign a weight factor ( 1 2 ) n, where n is the number of equivalent line pairs; two lines are equivalent if they connect the same two vertices in the same direction. 6. Diagrammatic representation of RSPT The zero- and first-order energies are given by E (0) 0 = ε i (147) E (1) 0 = 1 2 i ij ˆV ij (148) ij The second-order energy expression can be alternatively written in the following equivalent = form which is more useful, E (2) 0 = K 0 Ψ (0) 0 ˆV Ψ (0) (0) K Ψ K ˆV Ψ (0) E (0) 0 E(0) K 0 = Ψ (0) 0 ˆV ˆR 0 ˆV Ψ (0) 0 = a,b,i,j ij ab 2 ε i + ε j ε a ε b (149)

88 where, ˆR 0 = Ψ (0) (0) K Ψ K (150) K 0 E (0) 0 E(0) K is called the resolvent operator. Its presence in an expression is represented diagrammatically ˆR 0 a i j b by a thin horizontal line cutting the particle-hole lines, as shown on the figure. ˆR 0 does not change the state on which it operates, it only represents the division by the energy denominator, therefore any particle or hole lines present below the point of action of ˆR 0 continue unchanged above it. Expressions we have derived for MP 1, MP 2 and MP 3 are correlation energies which corrects the Hartree Fock energy. While computing, MP2 is less expensive and give significance improvement. In principle, one could go up to higher orders of perturbation theory (MP3, MP4, etc), but the computer programs become too hard to write, and the results (perhaps surprisingly) don t necessarily get any better. E. Time versions The diagrams which may be transformed one into the other by topological deformations (transformations) which do not preserve the order of operators along the time axis are referred to as time versions of the same diagrams. 1. Time version of the first kind Time version of the first kind may be obtained one from another by the permutation of vertices which do not change the particle-hole character of any of the fermion line in the diagram. For example, the diagrams in Fig.(2) are time versions of the first kind of a fourth order energy diagram with two U vertices. 2. Time version of the second kind When the vertex permutation changes the hole-particle character of atleast one line, we obtain different time versions of the second kind. Thus the diagram in Fig.(3) is time version of the second kind of the first diagram. 88

89 FIG. 2. Time version of first kind. F. Connected and disconnected diagrams FIG. 3. Time version of second kind. In the second order wavefunction we have either disconnected or connected diagrams. Ψ (2) = ˆR 0 Ŵ ˆR 0 Ŵ 0 Fig.(4) are all the possible Hugenholtz diagrams for the second order wavefunction contribution. Thus we have one disconnected diagram (i) which yeilds a quadruply excited contribution, while the remaining diagrams are connected and correspond to triply (ii), (iii) doubly (iv), (v), (vi) and singly (vii), (viii) excited contributions. No contribution of the vacuum can arise, since any diagram having only internal lines (i.e. energy diagrams) would lead to a dangerous denominator. 89

90 FIG. 4. Second order wavefunction correction. FIG. 5. Third order wavefunction correction. G. Linked and unlinked diagrams We have seen in the preceding section that for wavefunction contribution we obtain disconnected diagrams already in the second order. In higher order of perturbation theory disconnected diagrams of another type will occur. For example, a few possible third order wavefuntion diagrams, which are disconnected are as shown in Fig.(5). Even though all these diagrams are bona-fide wavefunction diagrams (i.e., no dangerous denominators). We shall see that the latter diagram (iii) has a very different character than the former two diagrams (i) and (ii), since it contains an energy diagram as a disconnected part. We shall refer to energy diagrams, which have no external lines, as vacuum diagrams (or vacuum parts when they form a disconnected part of some diagrams), since they represent Fermi vacuum mean values. Further, a disconnected is unlinked if it has atleast 90

91 FIG. 6. Unlinked part of third order wavefunction correction. one disconnected vacuum part, and linked, if it has no disconnected vacuum part. Any unlinked diagram is by definition a disconnected diagram, while a linked diagram can be either connected or disconnected. In the latter case, however, none of its disconnected parts can be a vacuum diagram. On the other hand, a connected diagram is always linked (even if it s a vacuum diagram), while a disconnected diagram can be either linked or unlinked, depending on whether all of its disconnected parts are of a non-vacuum type or not respectively. Obviously, each unlinked diagram has a number of time versions of the first kind, since its disconnected parts can be positioned relative to one another in all distinct ways which do not introduce dangerous denominators. Thus, in the case of diagrams Fig.(5(iii)), there are two possible time versions as shown in Fig.(6). The contributions from either of these time vertices differ only in the denominator part since all the scalar factors associated with the vertices and all the operators associated with external lines are clearly identical. Designating the denominator of the vacuum part, considered as a seperate diagram, by a and, similarly, the denominator of the part involving external lines (considered seperately) as b, the contribution from both time versions is 1 N b(a + b)b + N 1 a(a + b)b where N designates the identical numerator part. Carrying out the sum we get, N ( 1 1 b(a + b) b + 1 ) = N a + b a b(a + b)ab = N 1 ab 2 (151) 91

92 FIG. 7. Linked disconnected diagrams for parts A and B. The above result is easily seen to be precisely the contribution, except for the sign, from the third order renormalization term in third order corrected wavefuntion given as, Ψ (3) = ( ˆR 0 Ŵ ) 3 } {{ } P rincipal term 0 Ŵ ˆR 0 Ŵ 0 ˆR 2 0Ŵ 0 (152) } {{ } Renormalization term which is given, up to the sign, by the product of the second order energy contribution and the first order wavefunction contribution taken with the second order denominator vertex and thus equals N 1 a 1. Therefore, the renormalization term Eq.(152) exacly b 2 cancels the contribution from unlinked diagrams Fig.(6) originating from the principal third order term, given by Eq.(151). H. Factorization lemma (Frantz and Mills) Consider all the possible time versions for a linked diagram consisting of two disconnected parts called A and B as shown in Fig.(7). Let the set of energy denominators for the part A alone be a µ, µ = 1,...,m, and for the part B be b ν, ν = 1,...,n. The denominators are numbered along the time axis, i.e., the lowest denominators are A 1 and B 1 in parts A and B respectively. The denominator contribution from all time versions of the first kind corresponding to all possible orderings of permutation vertices in parts A and B relative to one another 92

93 can be written as D AB mn given as Dmn AB = m+n {α,β} p=1 ( A α(p) + B β(p)) 1 (153) where the summation extends over all the sets of (m + n) integer pairs Γ p = (α(p),β(p)) such that 0 α(p) m & 0 β(p) n. Γ p is defined as follows: 1. Γ 1 = (1,0) or Γ 1 = (0,1). 2. Γ p+1 = (α(p) + 1,β(p)) or Γ p+1 = (α(p),β(p) + 1). 3. Γ m+n = (m + n); α(m + n) = m, β(m + n) = n. where we also define A 0 = B 0 = 0 For the seperate disconnected parts the denominators are given by the products of A µ or B ν which can alos be written using the general expression Eq.(153) as m Dm A = Dm0 AB = ( ) A 1 n µ & Dn B = D0n AB = ( ) B 1 ν (154) µ=1 where we define D0 A = DB 0 = DAB 00 stated as ν=1 = 1 The desired factorization lemma can now be simly D AB mn = D A md B n (155) The proof is easily carried out using mathematical induction. The lemma holds when m = 0 or n = 0, Since D AB m0 = DA md B 0 = DA m or D AB 0n = DA 0 DB n = D B n in agreement with Eq.(154). Assume that the lemma holds for M = m 1,N = n and M = m,n = n 1, m,n 1, i.e. DMN AB = DA M DB N. Clearly, all the terms in DAB mn can be divided into two disjoint classes according to whether the leftmost interaction occurs in A or in B subgraph, respectively, The last (top) denomimator factor being always the same as required by (3), namely ( A m + B n), we can write Dmn AB = ( ) A m + B 1 ( n D AB m 1,n + ) DAB m,n 1 (156) Since all the remaining factors are identical with those characterizing the disconnected diagrams which results when one top vertex is deleted: either in the A part (M = m 1,N = n) or in the B part (M = m,n = n 1). This result holds even when m or n equals 1. 93

94 Since the lemma Eq.(155) holds for M = m 1,N = n and M = m,n = n 1 by assumption, we can write Eq.(156) as Dmn AB = ( A m + n) B 1 ( ) D A m 1 Dn B + DmD A n 1 B (157) The denominator of seperate parts are given by a single product, Eqs.(154), so we have D A m = D A m 1 ( A m ) 1 = D A m 1 = DA m A m D B n = D B n 1 ( B n ) 1 = D B n 1 = DB n B n so we get D AB mn = ( A m + B n) 1 ( D A m A md B n + D A md B n B n) = D A m D B n proving the lemma. I. Linked-cluster theorem We saw that for third order correction to the wavefunction, the renormalization term was cancelled by the unlinked term from the principal term. This happens at all orders, so the contribution in each order is given by all the linked diagrams. In this derivation the linked-diagram expansions for the wave function and energy are substituted into the recursive form Eq.(158) of the Schrödinger equation (This equation can be found in Shavitt s Chapter 2 Eq.(2.75)), and the factorization theorem is used to show that this expansion satisfies the equation. To prove this assertion we first rewrite Eq.(2.75) from Shavitt s book in a form appropriate for RSPT, Ψ = 0 + ˆR 0 (Ŵ E ) Ψ (158) E = 0 Ŵ Ψ (159) where ˆR 0 ˆR 0 (E 0 ),Ŵ = ˆV E (1) and E = E E ref = E E 0 E (1). The implicit equations (158), (159) for Ψ and E are entirely equivalent to the Schrödinger equation. We need to prove that these equations are satisfied by the linked-diagram expansions Ψ = [( ˆR 0 Ŵ ) n ] 0 L n=0 (160) E = 0 Ŵ ( ˆR 0 Ŵ ) n 0 L (161) n=1 where the subscript L indicates that the summations are limited to linked diagrams only (note that the n = 0 term is missing in the summation for E in Eq.(161) because 94

95 0 Ŵ 0 = 0). We are going to prove this assertion by substituting Eq.(160) and Eq.(161) into the recursive equations (158), (159) and showing that the latter are then satisfied. We first substitute Eq.(160) in Eq.(159), obtaining E = 0 Ŵ [( ˆR 0 Ŵ ) n ] 0 n=1 L (162) It is easy to verify that all the closed diagrams that can be formed by adding a new top vertex to the upwards-open linked n-vertex diagrams are linked (because all disconnected parts of the open diagram must be closed by the single added vertex) and constitute the complete set of all closed linked (n + 1)-vertex diagrams. Therefore Eq.(162) is consistent with Eq.(159). Next we substitute Eq.(160) in Eq.(158), resulting in Ψ = 0 + = 0 + n=0 n=0 ˆR 0 (Ŵ E )[( ˆR 0 Ŵ ) n 0 ] ˆR 0 Ŵ [( ˆR 0 Ŵ ) n ] 0 [( E ˆR L 0 ˆR 0 Ŵ ) n ] 0 n=0 L L (163) Each term of the first sum over n in the second line of Eq.(163) consists of all the upwards-open (n + 1)-vertex diagrams that can be formed by adding one vertex (and the corresponding resolvent) to all upwards-open linked n vertex diagrams. Each resulting diagram either is linked or is unlinked with a single separate closed part (if the added vertex closed a disconnected part of the n-vertex open diagram) and has the top vertex of the closed part as the top vertex of the entire diagram. We may therefore rewrite Eq.(163) in the form [ Ψ = 0 + ˆR 0 Ŵ ( ˆR 0 Ŵ ) n 0 ]L + [ ˆR 0 Ŵ [( ˆR 0 Ŵ ) n ] 0 ]U [( E ˆR 0 ˆR 0 Ŵ ) n ] 0 n=0 n=0 L n=0 L (164) where the subscript U indicates restriction to unlinked terms. The factorization theorem can then be used to show the cancellation of the last two sums in this equation, because each term in the third sum can be described by an open diagram with an insertion above its top vertex; this diagram cancels the contributions to the second sum from the sum of corresponding unlinked two-part open diagrams in which the top vertex of the closed part is the top vertex of the entire diagram. The remaining terms of the right-hand side are equivalent to the linked-diagram expansion Eq.(160), proving that this expansion satisfies Eq.(158) and the Schrödinger equation. 95

96 J. Removal of spin So far the formalism has been specified in terms of spinorbitals, and no attempt has been made to consider the effects of spin. However, since the nonrelativistic Hamiltonian does not contain spin coordinates, integration over the spin variables is easily carried out and results in significant economies in the calculations. The simplest way in which spin affects the perturbation theory summations is that some integrals vanish because of spin orthogonality. Thus if we indicate the spin factor of a spinorbital by putting a bar over β spinorbitals, and no bar over α s, we have pq rs = pq v rs pq v sr, p q r s = pq v rs, p q rs = pq v sr, pq rs = pq v rs, pq r s = pq v sr, p q r s = pq v rs pq v sr, where the integrals on the r.h.s. are over the spatial factors only, and pq rs = p q rs = pq rs = pq r s = 0, p q rs = pq r s = 0, p q rs = p q r s = pq r s = p q r s = 0. Thus, out of the 16 possible combinations of spin assignments to the four orbitals in an antisymmetric two-electron integral, 10 of the resulting integrals vanish completely and four are reduced to a single spatial integral. Taking the second-order energy in the canonical RHF case as an example, we ve: ij ab ab ij E (2) = 1 4 E (2) = 1 4 abij abij 1 ɛ ab ij 1 ɛ ab ij E (2) = 1 4 abij ɛ ab ij [ ij ab ab ij + i j a b a b i j + i j āb āb i j ] abij 1 ɛ ab ij [ ] īj a b a b īj + īj āb āb īj + ī j ā b ā b ī j [ ij ab ab ij + ij v ab ab v ij + ij v ba ab v ji ] abij 1 ɛ ab ij [ ij v ba ab v ji + ij v ab ab v ij + ij ab ab ij ] 96

97 E (2) = 1 2 E (2) = 1 2 E (2) = 1 2 E (2) = 1 4 abij abij abij 1 ɛ ab ij 1 ɛ ab ij 1 ɛ ab ij abij 1 ɛ ab ij [2 ij ab ab ij + 2 ij v ab ab v ij + 2 ij v ba ab v ji ] [( ij v ab ij v ba )( ab v ij ab v ji ) + ij v ab ab v ij + ij v ba ab v ji ] [ ij v ab ab v ij ij v ba ab v ij ij v ab ab v ji ] abij 1 ɛ ab ij [ ij v ba ab v ji + ij v ab ab v ij + ij v ba ab v ji ] [2 ij v ab ab v ij + 2 ij v ba ab v ji ij v ab ab v ji ij v ba ab v ij ] Exchanging electron labels in second part of second and third term in above equation. E (2) = 1 2 abij 1 ɛ ab ij [2 ij v ab ab v ij + 2 ij v ba ba v ij ij v ab ba v ij ij v ba ab v ij ] where the summations are over the distinct spatial orbitals only. Since a, b are dummy summation indices and can be interchanged, we find that the first two terms in the brackets are equal (after summation), and so are the third and fourth. Thus E (2) = abij 1 ɛ ab ij [2 ij v ab ab v ij ij v ab ba v ij ] E (2) ij v ab = [2 ab v ij ba v ij ] abij ɛ ab ij Similar treatments hold for other terms. IX. COUPLED CLUSTER THEORY The coupled cluster theory was introduced in 1960 by Coester and Kummel for calculating nuclear binding energies. In 1966 J. Cizek and latter with J. Paldus reformulated the method for electron correlation in atoms and molecules. A. Exponential ansatz Ψ cc = ΩΦ 0 Ψ cc = e ˆT Φ 0 97

98 where Ω is often called wave operator as it takes an unperturbed solution into the exact solution and ˆT = Tˆ 1 + Tˆ 2 + Tˆ Tˆ m where ˆ T 1 = Tˆ 2 = 1 ia (2!) 2 ijab. t a i { ˆ a î} tij ab { a ˆ b ˆ ĵî} Tˆ m = 1.. (m!) 2 ij... ab... t ab... ij... { ˆ a b ˆ...ĵî} where m N and N represents the number of electrons. are coefficients to be determined, usually referred as "amplitudes" for the corresponding operators. Also tij ab = tji ab = tij ba = tji ba The simplest couple cluster approach is that of coupled cluster doubles (CCD) in which ˆT is truncated to ˆT CCD = ˆ T 2 The most common extension of this model is coupled cluster singles and doubles (CCSD), defined by ˆT CCSD = Tˆ 1 + Tˆ 2 t ab... ij... and similarly ˆT CCSDT = ˆ T 1 + ˆ T 2 + ˆ T 3 B. Size consistency Consider a system AB composed of two non-interacting components A and B Φ 0 (AB) = Φ 0 (A)Φ 0 (B) T (AB) = T (A) + T (B) then Ψ (AB) = e T (AB) Φ 0 (AB) Ψ (AB) = e T (A)+T (B) Φ 0 (A)Φ 0 (B) 98

99 Ψ (AB) = e T (A) Φ 0 (A)e T (B) Φ 0 (B) Ψ (AB) = Ψ (A)Ψ (B) This separability of wavefunction ensures the additivity of the energy H(AB)Ψ (AB) = [H(A) + H(B)]Ψ (A)Ψ (B) H(AB)Ψ (AB) = [E(A) + E(B)]Ψ (A)Ψ (B) H(AB)Ψ (AB) = [E(A) + E(B)]Ψ (AB) C. CC method with double excitations The Schrodinger equation is HΨ CCD = E CCD Ψ CCD (165) Φ 0 H Ψ CCD = E CCD Φ 0 Ψ CCD We can put Φ 0 Ψ CCD = 1 by the choice of intermediate normalization E CCD = Φ 0 H Ψ CCD (166) where Ψ CCD = e T 2Φ 0 In order to make our calculation easy, we write the total Hamiltonian H in normal order form i.e. H = H N + 0 H 0 where H = H N + E ref and H N = F N + W N H N = f pq { ˆp ˆq} + 1 pq rs { ˆp ˆq ŝ ˆr} 4 pq pqrs E ref = 0 H 0 So equation (170) becomes E CCD E ref = Φ 0 (H E ref ) Ψ CCD E CCD = Φ 0 H N Ψ CCD Let for simplicity Φ 0 = 0 so E CCD = 0 H N e T

100 E CCD = 0 H N (1 + T T 2 2 ) 0 (167) The first term in above equation is zero because H N is in normal order form and also the third term is zero due to slatter-condon rule so we get E CCD = 1 4 ijab t ab E CCD = 0 H N T 2 0 [ ij 0 f pq { ˆp ˆq} + 1 ] pq rs { ˆp ˆq ŝ ˆr} { a ˆ 4 b ˆ ĵî} 0 pq pqrs The first term is zero since it is in normal order form and the second term becomes E CCD = 1 16 ijab pqrs pq rs t ab ij 0 { ˆp ˆq ŝ ˆr}{ ˆ As we know that the full contraction terms survive which are E CCD = 1 16 so ijab pqrs a b ˆ ĵî} 0 [ ] 0 p q sra b ji p q sra b ji p q sra b ji p q sra b ji 0 E CCD = 1 16 ] [δ pi δ qj δ sb δ ra δ pi δ qj δ sa δ rb δ pj δ qi δ sb δ ra + δ pj δ qi δ sa δ rb ijab pqrs E CCD = 1 16 ijab since ij ab = ij ba so the above equation becomes [ ] ij ab ij ba ji ab + ji ba tij ab E CCD = 1 4 ijab ij ab t ab ij To calculate energy, we need the amplitudes tij ab and we can obtain equation for these amplitudes by projecting equation (169) onto all double excitation i.e. Φ ab ij H N Ψ CCD = E CCD Φ ab ij Ψ CCD Φ ab ij H N e T 2 0 = E CCD Φ ab ij et 2 0 Φ ab ij H N (1 + T T 2 2 ) 0 = E CCD Φ ab ij (1 + T T 2 2 ) 0 100

101 Φ ab ij H N (1 + T T 2 2 ) 0 = E CCD Φ ab ij T 2 0 Φ ab ij H N (1 + T T 2 2 ) 0 = 1 (2!) 2 E CCD Φij ab Φa b i j t a b i j i j a b Φ ab ij H N (1 + T T 2 2 ) 0 = E CCDt ab ij (168) Let s evaluate each term separately of the LHS of equation (168). The first term is equal to Φ ab ij H N 0 = ab ij in above equation we used the Slatter-Condon rule. Now let s evaluate the second term Φ ab ij H N T 2 0 = 1 4 Φ ab ij H N T 2 0 = 1 4 klcd klcd Φij ab H N Φkl cd tcd kl Φij ab (F N + W N ) Φkl cd tcd kl We solve each term separately in above equation. Let s name the first term as L 1 and the second as L 2, so L 1 = 1 4 klcd L 1 = 1 4 pq klcd Φij ab F N Φkl cd tcd kl f pq 0 {î ĵ ˆbâ}{pq}{ĉ d ˆl ˆ cd ˆk} 0 tkl Here the possible contractions are 16, four of which are [ ] [ ] 0 {i j ba}{pq}{c d lk} {i j ba}{pq}{c d lk} 0 + = [ ] [ ] 0 {i j ba}{pq}{c d lk} {i j ba}{pq}{c d lk} 0 tkl cd [δ ik δ jl δ bd δ ap δ cq + δ ik δ jl δ ac δ bp δ dq δ ik δ jq δ lp δ ac δ bd δ iq δ kp δ jl δ ac δ bd ] t cd kl As there are twelve more terms, in each case we obtain contributions that are equal to the above four terms canceling the factor 4 1 and hence we get L 1 = f ac tij cb + f bd tij ad c d 101 l f lj t ab il k f ki t ab ij

102 Changing the dummy summation indices in some terms and permuting some indices gives the following result. In canonical Hartee-Fock case L 1 = (f ac tij cb f bctij ac ) (f jk tik ab f ikt ab c c k k jk ) L 1 = (ε a δ ac tij bc ε bδ bc tij ac ) (ε j δ jk tik ab ε iδ ik t ab L 1 = (ε a t ba ij ε bt ab ij ) (ε jt ab ij ε it ab ji ) jk ) L 1 = (ε a + ε b ε i ε j )t ab ji For the two particle part of the linear term, we have to evaluate L 2 = 1 16 pqrs klcd pq rs 0 {i j ba}{p q sr}{c d lk} 0 t cd kl To obtain valid contractions in this case we must form two contraction each. They can be classified into three cases (a) contract two pairs of hole-index operators (b) contract two pairs of particle-index operators (c) contract one pair of each type. L 2a = 1 16 pqrs klcd L 2a = 1 16 L 2a = 1 8 pqrs klcd pqrs pq rs 0 {i j ba}{p q sr}{c d lk} 0 t cd kl pq rs 0 {ba}{p q sr}{c d } 0 tkl cd cd [ pq rs 0 {ba}{p q sr}{c d } {ba}{p q sr}{c d } 0 + ] 0 {ba}{p q sr}{c d } {ba}{p q sr}{c d } 0 tkl cd L 2a = 1 16 Now pqrs klcd ] pq rs [δ bq δ ap δ ds δ cr δ bq δ ap δ cs δ dr δ bp δ aq δ ds δ cr + δ bq δ ap δ cs δ dr tkl cd L 2a = 1 2 cd 102 ab cd t cd ij

103 L 2b = 1 16 L 2b = 1 8 pqrs klcd pqrs By similar contraction as for L 2a, we get For L 2c, L 2c = 1 16 pqrs klcd L 2c = 1 16 pqrs klcd pq rs 0 {i j ba}{p q sr}{c d lk} 0 t cd kl kl pq rs 0 {i j }{p q sr}{lk} 0 t cd kl L 2b = 1 2 kl kl ij t ab kl pq rs 0 {i j ba}{p q sr}{c d lk} 0 t cd kl [ pq rs 0 {i j ba}{p q sr}{c d lk} {i j ba}{p q sr}{c d lk} 0 + ] 0 {i j ba}{p q sr}{c d lk} {i j ba}{p q sr}{c d lk} 0 tkl cd L 2c = 1 16 pqrs klcd [ pq rs 0 {j b}{p q sr}{c k} 0 tik ac 0 {i b}{p q sr}{c k} 0 tjk ac ] 0 {j a}{p q sr}{c k} 0 tik ba + 0 {i a}{p q sr}{c k} 0 tjk bc The first term in above equation can be contracted in four ways 0 {j b}{p q sr}{c k} + {j b}{p q sr}{c k} + {j b}{p q sr}{c k} + {j b}{p q sr}{c k} 0 After simplifying, we get = bk cj t ac ik Apply the same procedure on the other three terms of L 2c and then after combining all the terms, we get ( ) L 2c = bk cj tik ac bk ci tac jk ak cj tbc ik + ak cj tbc jk After adding L 1, L 2a, L 2 b, L 2 c, we get kc L = L 1 + L 2a + L 2b + L 2c 103

104 L = (ε a + ε b ε i ε j )t ab ji kc cd ab cd t cd ij kl kl ij t ab kl ( bk cj tik ac bk ci tac jk ak cj tbc ik + ak cj tbc jk Now we solve for the quadratic term in equation () ) Q = Φ ab ij H N ( 1 2 T 2 2 ) 0 Q = Φ ab ij (F N + W N )( 1 2 T 2 2 ) 0 Q = Φij ab (F N )( 1 2 T 2 2 ) 0 + Φab ij W N ( 1 2 T 2 2 ) 0 The first term in above equation having one electron operator is zero. If we think in terms of diagram, this becomes more clear. Since T2 2 corresponds to quadruple excitation while the target state is a double excitation, we must use a 2 de-excitation level diagram but F N has at most 1 de-excitation and hence becomes zero. The second term having two electron operator is Q = 1 8 pqrs m>n,e>f k>l,c>d Q = 1 2 Φab ij W N (T 2 2 ) 0 pq rs 0 {i j ba}{p q sr}{c d lk}{e f nm} 0 t cd kl tef mn No nonzero contractions are possible between the third and fourth normal products in above equation and thus, to obtain nonzero contributions, four of the eight operators in the third and fourth normal products have to be contracted with the first product, and the remaining four with the second product. We shall first consider the case in which the four operators of the first product are contracted with the four operators of the fourth. This term, and the similar one in which the four contractions are between the first and third normal products, represent unlinked contributions since the set of contractions involving the first normal product is decoupled from the set involving the second. Considering the inequalities in the restricted summations over m, n, e, f and the restriction i > j,a > b, the contractions between the first and fourth products can be accomplished in only one way: Q a = 1 8 pqrs m>n,e>f k>l,c>d pq rs 0 {i j ba}{p q sr}{c d lk}{e f nm} 0 t cd kl tef mn 104

105 Q a = 1 8 pqrs k>l,c>d pq rs 0 {p q sr}{c d lk} 0 tkl cd tab ij The above term can be contracted in four possible ways which gives equal contribution and is given Q a = 1 2 k>l,c>d kl cd tkl cd tab ij The same result is obtained (after renaming the summation indices) for the case in which the four operators of the first product are contracted with those in the third product, and thus we get Q b = 1 2 k>l,c>d kl cd tkl cd tab ij The remaining terms in the quadratic contribution fall into four classes, depending on the pattern of contractions of the first normal product. In class (a) the two hole operators of the first product are contracted with either the third or the fourth product (i.e. i and i are contracted with k and l, respectively, or with m and n, respectively, using ordered sums) while the two particle operators are contracted with the fourth or third product, respectively. These two types of contraction produce equal results, canceling a factor 1/2. Then converting to unrestricted summations adds a factor 1/4, which is later canceled by the four equivalent ways of contracting the remaining operators, giving Q c = 1 16 pqrs klcd pq rs 0 {p q sr}{lk}{c d } 0 tkl cd tab ij Q c = 1 4 klcd kl cd t cd ij tab kl In class (b) one hole and one particle operator of the first normal product are contracted with operators in the third product, while the remaining two operators are contracted with operators in the fourth. Converting to unrestricted summations, which introduces an additional factor 1/16, we find that there are 64 choices for these contractions. Specifically, there are four ways for i and a to be contracted with operators in the third product while j and b can be contracted with operators in the fourth product in four ways, giving 16 equal terms; contracting i and a with operators in the fourth product while j and b are contracted with operators in the third product give 16 more terms equal to the above, for a total of 32 equal terms. Another set of 32 equal terms is obtained by contracting i and b with operators in the third product while j and a are contracted with operators in the fourth product, or vice versa. In total, after renaming the summation indices and performing the remaining contractions we get 105

106 Q d = 1 4 pqrs klcd pq rs 0 {p q sr}{c k}{d l} 0 (t ac ik tbd jl t bc ik tad jl ) Q d = pq rs (tik ac tbd jl tik bc tad jl ) klcd Q d = pq rs (tik ac tbd jl + tik bd tac jl ) klcd In classes (c) and (d) three operators of the first normal product are contracted with operators in the third product and one with an operator in the fourth, or vice versa. In class (c) the set of three operators in the first product consists of two particle operators and one hole operator while in class (d) it consists of one particle operator and two hole operators. Furthermore, each case can be generated in two distinct ways, depending on whether the set of three operators is j ab or j ab for (c) and i j a or i j b for (d). There are 16 possibilities in each case: the set of three operators in the first product can be contracted with operators in the third or the fourth product, and in each case these three contractions can be done in four ways, while the remaining single contraction can be chosen in two ways. The 16 possibilities lead to equivalent results, canceling the factor 1/16 obtained by converting to unrestricted summations. As an example, the first Q e term can be written in the form 1 8 pqrs mnef klcd pq rs 0 {i j ba}{p q sr}{c d lk}{e f nm} 0 t cd kl tef mn = 1 8 pqrs klcd 0 {p q sr}{c d k}{l} 0 tkj cd tab li The sign reflects the odd number of interchanges needed to move all the contracted operators to the front in pairs (note that the summation index m is changed to l after the contraction). The remaining operators can be contracted in four ways: 0 {p q sr}{c d k}{l} + {p q sr}{c d k}{l} + {p q sr}{c d k}{l} + {p q sr}{c d k}{l} 0 = 1 2 klcd Similarly for the second term of Q d, we get 1 2 klcd kl cd tik ab tcd jl kl cd tik cd tab jl 106

107 and hence we get Q e = 1 2 klcd kl cd (t cd ik tab jl + t ab ik tcd jl ) and using the same procedure for case (d), we get Q f = 1 2 klcd When all Q s are put together, we get kl cd (t ac ij tbd kl + t bd ij tac kl ) Q = 1 8 klcd kl cd t cd kl tab ij klcd + t bd ik tac jl ) 1 2 kl cd t cd kl tab ij klcd klcd kl cd t cd ij tab kl cd (t cd ik tab jl + t ab ik tcd jl ) 1 2 kl + klcd klcd pq rs (tik ac tbd jl kl cd (t ac ij tbd kl + t bd ij tac kl ) Equation (168), after putting all the values and some cancellation, we get ε ab ij tab ij = ab ij cd ab cd t cd ij ak cj tik bc + ak cj tbc jk 1 2 ) klcd klcd kl ( kl ij tkl ab bk cj tik ac bk ci tac jk kl cd t cd ij tab kc kl + klcd kl cd (t cd ik tab jl + t ab ik tcd jl ) 1 2 pq rs (t ac ik tbd jl + t bd ik tac jl ) klcd kl cd (t ac ij tbd kl + t bd ij tac kl ) In order to solve for the following CCD amplitude equation, we proceed as follows ε ab ij tab ij = ab ij cd ab cd t cd ij ak cj tik bc + ak cj tbc jk 1 2 ) klcd klcd kl ( kl ij tkl ab bk cj tik ac bk ci tac jk kl cd t cd ij tab kc kl + klcd kl cd (t cd ik tab jl + t ab ik tcd jl ) 1 2 ε ab ij tab ij = ab ij + L(t) + Q(tt) pq rs (t ac ik tbd jl + t bd ik tac jl ) klcd kl cd (t ac ij tbd kl + t bd ij tac kl ) where L(t) and Q(tt) corresponds to the linear and quadratic amplitudes respectively. Now the question is how to solve for the amplitude t s. Here we will use the iterative method. First we substitute L(t) and Q(tt) equal to zero. Thus the first approximation to t ab ij is 107

108 t ab ij = ab ij ε ab ij This gives an estimate of each amplitude. This approximate value is then substituted then back on the right hand side to evaluate the left hand side and so forth. Finally, one can achieve a self-consistency of the iterative process and obtain the CC function for the ground state of the system. A more efficient way is when the initial amplitudes are taken from a short CI expansion, with subsequent linearization of terms containing the initial (known) amplitudes. D. Equivalence of CC and MBPT theory Here we will show that CC form of wave function can be derived from the infinite order of MBPT wave function. The total wave function in MBPT can be written as Ψ MBP T = (R 0 ˆV N ) n 0 n=0 = Φ 0 + Ψ (1) + Ψ (2) +... where the superscripts indicate the order in V N and where V N = F 0 N + W The algebraic expression for individual orders of Ψ can be written as Ψ (1) = 1 ab ij 4 abij ε ab ij Φ ab ij + ai a f ˆ i ε a i Φ a i P si (2) = 1 8 abcdijk abcdij ab dk ad ij εijk abcεad ij ab cd cd ij εij abεcd ij Φ abc ijk 1 4 Φ ab ij abcijkl abcijk ak cj cb ik ε ab ij εbc ik lc jk ab il εijk abcεab il Φ abc ijk Φij ab abcdijkl abcij aj cb cb ij ε cb ij εa i cd kl ab ij εijkl abcdεab ij Φ a i Φ abcd ijkl +... In Ψ 1, Ψ 2 etc. contain expressions corresponding to connected and disconnected diagrams. Let r=1 contain all the expressions corresponding to connected form of wave function diagrams. We can represent this class by connected operator ˆT, where ˆT = (R 0 ˆV N ) n 0 C n=0 ˆT = N m=1 108 ˆT m

109 We can see that in perturbation theory, infinite order of connected form corresponds to T m. so we can write N T m = m=1 T (n) m T m 0 = {(R 0 ˆV N ) n 0 } C,m and hence the corresponding expansion for amplitudes is t abc... ijk... = n=0 t abc...(n) ijk... As an example, we can write the expression corresponding to T (1) 2 is where T (1) 2 = 1 4 abij t ab(1) ij ab ij ε ab ij = ab ij ε ab ij Initial few terms in the expansion of T 1, T 2 interms of MBPT expressions are given Φ ab ij T 1 0 = (T (1) 1 + T (2) 1 + T (3) ) 0 ( a f = ˆ i a i + 1 aj cb cb ij a i 1 aj cb cb ij 2 2 ai ε a i abcij ε cb ij εa i abcij T 2 0 = (T (1) 2 + T (2) 2 + T (3) ) 0 ε cb ij εa i ) a i ( 1 ab ij = a b ji abij ε ab ij ab cd cd ij a b ak cj cb ik ji abcdij εij abεcd ij abcijk ε ab ij εbc ik ) a b ji similarly we can write for T 3 and other terms. Now we will show that r = 2(having two disconnected parts) expression of MBPT corresponds to the square terms in CC theory. we will illustrate this, for m=2, r=2 and in first order. We can write the last term of Ψ (2) as 1 16 abcdijkl cd kl ab ij = εijkl abcdεab ij abcdijkl ( cd kl ab ij εijkl abcdεab ij + 1 εijkl abcdεcd kl ) {a b ji}{c d lk}

110 = abcdijkl after simplification, we get ( 1 cd kl ab ij (εij ab + + εcd kl )εab ij = abcdijkl 1 (εij ab + εcd kl )εcd kl cd kl ab ij {a b ji}{c d lk} εkl cdεab ij ) {a b ji}{c d lk} = 1 (1) (T 2 2 )2 if we collect terms in a similar fashion, at the end we will get all the terms of exponential in CC theory i.e. Ψ MBP T = e ˆT 0 E. Noniterative triple excitations correction CCSDT (coupled cluster single double and triple) approximation is more accurate than CCSD but has an order of N 8 and hence is very expensive computationally. In order to reduce the cost, MBPT have been used to account for the famous (T) correction and is called CCSD(T) approach instead of using full CCSDT approximation. Now we will show that how MBPT can be used in (T) correction of connected triple excitation. We can decompose the normal ordered Hamiltonian H N as follows H N = H 0 + H 1 = F N + V N where zeroth order component of the Hamiltonian is taken to be the Fock operator such that the perturbation operator is then the remaining two electron operator i.e. V N. Also we can decompose the cluster operators as done before in previous section i.e. We can define our Hamiltonian as T m = (T (1) m + T (2) m + T (3) m +... (169) H = e T H N e T For T = T 1 + T 2 + T 3, the above equation takes the form H = (H N + H N T 1 + H N T 2 + H N T H N T H N T H N T H N T 1 T 2 + H N T 1 T ) C The proof of this relation is asked in the homework problem set. As in the previous section, we proved the equivalence between CC theory and MBPT interms of wave 110

111 function so in a similar way we can show the equivalence in terms of energy. The CCSD energy contains contributions identical to those of MBPT(2) and MBPT(3) energy, but lacks triple excitation contribution necessary for MBPT(4). Thus a natural approach to the "triples problem" is to correct the CCSD energy for the missing MBPT(4) terms using the CCSDT similarity-transformed Hamiltonian, H = e T 1 T 2 T 3 H N e T 1+T 2 +T 3 For m=1,2,3 in equation (169) and then plug in the above equation we get H = H (0) + H (1) + H (2) +... Here we are interested in calculating E (4) so we will need H (4) which is H (4) = (V N T 3 2 ) C so plugging the V N and T 3 2 E (4) = 0 (V N T 3 2 ) C 0 operator in above equation and using Wick s theorem, we get E (4) = 1 4 tab(3) ij ij ab (170) ijab in above equation, t ab(3) ij is not known and need to be calculated as follows 0 = Φ ab ij H (3) 0 0 = Φ ab ij (F N T V N T V N T V N T V N (T 1 2 )2 ) 0 we can see from above equation that it contains T3 2 amplitude. so in order to find this amplitude, we have and hence will involve tabc(2) ijk 0 = Φ abc ijk H (2) 0 0 = Φ abc ijk V N T V N T V N (T 1 2 )2 0 after plugging the operator and using Wick s theorem we get εijk abc tabc(2) ijk = P (i ij)p (a bc) bc dk t ad(1) ij P (i kj)p (c ab) lc jk t ab(1) il (171) d where P (p qr) permutation operators perform anti-symmetric permutations of index p with indices q and r. These T 2 3 amplitudes may then be used to compute the T l

112 amplitudes, which may then be used in equation (170) to compute the triple excitation contribution to the forth-order energy, E (4). The corrected CCSD energy is T E CCSD+T (4) = E CCSD + E (4) T and is referred as CCSD + T (4) method. If on the other hand one choose to use the converged CCSD T 2 amplitudes rather than first order T 2 in equation ( 174 ) then one can obtain different correction which is called CCSD+T(CCSD) or CCSD[T] E CCSD+T (CCSD) = E CCSD + E [4] T This approach is reported to give quantitatively incorrect predictions of molecular properties for some systems. In 1989 a similar analysis was developed by Raghavachari et al., who determined that a fifth-order energy contribution involving single excitations, denoted E [5] ST ;, should be included in the CCSD correction, as well. This component may be derived based on the second-order T 3 contribution to the third-order T 1 operator, which subsequently contributes to fourth-order T 2. Although the diagrammatic techniques described above are particularly convenient for deriving E [5] ST, here we will simply present the final equation E [5] ST = 1 jk bc t a 4 i tabc ijk ijkabc where the triple-excitation amplitudes are determined using a modified form of Eq. ( 174 ) that includes converged T 2 amplitudes [ εijk abc tabc(2) ijk = P (i jk)p (a bc) bc di tjk ad la jk til bc Hence, the total CCSD(T) energy may be succinctly written as d E CCSD(T ) = E CCSD + E [4] T + E[5] ST This method of energy calculation is called CCSD(T) approach and is the "Gold Standard" in quantum chemistry. The second method to solve for the amplitude equation is the multivariable Newton- Raphson Method. We can see that the amplitude equation we have is nonlinear. We can write the amplitude equation in matrix form (by defining tij ab as the ij,ab element of the t column vector) as l 0 = a + bt + ctt (172) where a = ab ij. The solution of these nonlinear algebraic equations pose a substantial difficulty in implementing coupled cluster theory. 112

113 To solve for the nonlinear amplitude equation, we choose t such that the vector f(t) defined as f (t) a + bt + ctt (173) becomes equal to zero. This is done by expanding f(t) about the point t 0. Keeping only linear terms in this Taylor expansion and setting f(t) equal to zero, one obtains equation for the changes t in the t amplitudes, which can be expressed as fij a ab b(t) = 0 = fij (t 0) + klcd ( f ab ) ij tkl cd t cd kl (174) t 0 The step lengths(corrections to t 0 ) can be obtained by solving the above set of linear equations and then used to update the t amplitudes t = t 0 + t These values of t can then be used as a new t 0 vector for the next application of Eq. (174). This multidimensional Newton-Raphson procedure, which involves the solution of a large number of coupled linear equations, is then repeated until the t values are sufficiently small (convergence). Although the first applications of the coupled cluster method to quantum chemistry did employ this Newton-Raphson scheme, the numerical problems involved in solving the large multivariable inhomogenous equations (174) has led more recent workers to use the perturbative techniques discussed already. To solve for nonlinear equation of coupled cluster theory, other methods were devised. One such method within the perturbative framework is the reduced linear equations technique, developed by Purvis and Bartlett. Although this method can efficiently solve a large systems of linear equations but can also be used for nonlinear coupled cluster equations by assuming an approximate linearization of the nonlinear terms. F. Full triple and higher excitations Due to the on going growth of in computational resources, it is nowadays often possible to perform full Coupled Cluster singles, doubles and triples (CCSDT) calculations in cases that demand very high accuracy. This method was first formulated in The complete inclusion of T 3 makes it harder and hence scales as N 8. However it is necessary in many cases to go beyond this level and to include correlation effects beyond CCSDT. The next method is CC single, double, triples, and quadruples (CCSDTQ) which is very expensive with a computational scaling of N 1 0. It is of interest to explore methods intermediate between CCSDT and CCSDTQ with a reduced scaling of the cost. The CC wave function including quadruple excitation is given by Ψ CCSDT Q = exp(t 1 +T 2 +T 3 +T 4 ). 113

114 X. LINEAR RESPONSE THEORY The linear response theory is used in situations when a system of electrons is subject to small perturbations. For example example an electric or magnetic field from a probe in an experiment. The response properties of a system determine the screening (dielectric) properties of the system and can be used to study excited states of a system. The density-density response function determines the second-order dispersion energy in the symmetry-adapted perturbation theory (SAPT) and the exact exchange-correlation energy of the density functional theory E xc. A. Response function Let us consider a system in the ground state Ψ 0 of a Hamiltonian operator Ĥ 0 i.e., Ĥ 0 Ψ 0 = E 0 Ψ 0. Let Â be an observable of the system then its expectation value is A 0 = Ψ 0 ÂΨ 0. The time evolution operator in this situation is Û(t,t 0 ) = e i(t t 0)Ĥ 0 and therefore the wave function should just have a phase change in time e i(t t 0)E 0. Let the system be subject to a perturbation Ĥ 1 (t) = F(t) ˆB, where F(t) is a time-dependent field coupled to an observable ˆB of the system. The total Hamiltonian becomes Ĥ(t) = Ĥ 0 + Ĥ 1 (t). As a consequence the expectation value of Â also become time dependent in general i.e., A(t) = Ψ (t) Â Ψ (t) t > t 0. (175) The difference A(t) A 0 is called the response of Â to the perturbation Ĥ 1 (t). The response in general can be written as A(t) A 0 = A 1 (t) + A 2 (t) + A 3 (t), (176) where A 1 (t) is the change which is first-order (linear) in the perturbation Ĥ 1 (t), A 2 (t) is second-order (quadratic) and so on. We will limit discussion to linear response A 1 (t). The time evolution operator can be written now as Û(t,t 0 ) = e i(t t 0)Ĥ 0 Û 1 (t,t 0 ). The state of the system after the application of perturbation evolves as Ψ (t) = Û(t,t 0 ) Ψ 0 = e i(t t 0)Ĥ 0 Û 1 (t,t 0 ) Ψ 0. (177) The time-dependent Schrödinger wave equation gives Ψ (t) i = [Ĥ t 0 + Ĥ 1 (t)] Ψ (t) (178) i Û 1 (t,t 0 ) = e i(t t 0)Ĥ 0 H ˆ t 1 (t)e i(t t 0)Ĥ 0 Û 1 (t,t 0 ) (179) Û 1 (t,t 0 ) = Û 1 (t 0,t 0 ) i t dt e i(t t 0 )Ĥ 0 H ˆ 1 (t )e i(t t 0)Ĥ 0 Û 1 (t,t 0 ) (180) t 0 114

115 As perturbation is turned on at t = t 0 so Û 1 (t 0,t 0 ) = 1. Therefore, Û 1 (t,t 0 ) = 1 i t dt e i(t t 0 )Ĥ 0 H ˆ 1 (t )e i(t t 0)Ĥ 0 Û 1 (t,t 0 ) (181) t 0 This integral transformation equation can be solved iteratively. The zeroth-order solution is Û (0) 1 (t,t 0) = 1, thus, the first-order solution is Û (1) 1 (t,t 0) = 1 i The time evolution operator to first-order in perturbation is t Û (1) (t,t 0 ) = e i(t t 0)Ĥ 0 [1 i dt e i(t t 0 )Ĥ 0 H ˆ 1 (t )e i(t t 0)Ĥ 0. (182) t 0 t The linear response then can be written as Consider Û (1) (t,t 0 )ÂÛ (1) (t,t 0 ) [ = = = [ [ 1 + i 1 + i 1 + i t t t = Â(t t 0) i = Â(t t 0) i = Â(t t 0) i A 1 (t) = A(t) A 0 = Ψ (t) Â Ψ (t) Ψ 0ÂΨ 0 t 0 dt F(t ) ˆB(t t 0 ) t 0 dt F(t ) ˆB(t t 0 ) t 0 dt F(t ) ˆB(t t 0 ) t t t dt e i(t t 0 )Ĥ 0 H ˆ 1 (t )e i(t t 0)Ĥ 0 ]. (183) t 0 = Ψ 0 Û (1) (t,t 0 )ÂÛ (1) (t,t 0 ) Ψ 0 Ψ 0 Â Ψ 0 (184) ]e +i(t t 0)Ĥ 0 Âe i(t t 0)Ĥ 0 [ ] ][ Â(t t 0 ) [ 1 i Â(t t 0 ) i t t t 0 dt F(t )Â(t t 0) ˆB(t t 0 ) + i 1 i t t 0 dt F(t ) ˆB(t t 0 ) dt F(t ) ˆB(t t 0 ) t 0 ] t 0 dt F(t )Â(t t 0) ˆB(t t 0 ) t t 0 dt F(t ) ˆB(t t 0 )Â(t t 0) + Ô(F2 ) t 0 dt F(t ){Â(t t 0) ˆB(t t 0 ) ˆB(t t 0 )Â(t t 0)} + Ô(F2 ) t 0 dt F(t ) [ Â(t t 0 ), ˆB(t t 0 ) ] + Ô(F2 ), (185) where [ Â(t t 0 ), ˆB(t t 0 ) ] is a commutator. Using Eq. (185) in Eq. (184) and keeping terms upto first order in field F we get t A 1 (t) = Ψ 0 Â Ψ 0 i dt F(t ) Ψ 0 [ Â(t t 0 ), ˆB(t t 0 ) ] Ψ 0 Ψ 0 Â Ψ 0 t 0 t = i dt F(t ) Ψ 0 [ Â(t t 0 ), ˆB(t t 0 ) ] Ψ 0 (186) t ] ]

116 Now consider [Â(t t0 ), ˆB(t t 0 ) ] = Â(t t 0) ˆB(t t 0 ) ˆB(t t 0 )Â(t t 0) = e i(t t 0)H 0 Âe i(t t )Ĥ 0 ˆBe i(t t 0 )Ĥ 0 e i(t t 0 )Ĥ 0 ˆBe i(t t )Ĥ 0 Âe i(t t 0)Ĥ 0 = e i(t t 0 )H 0 [Â(t t ), ˆB ] e i(t t 0 )H 0, (187) where Â(t t ) = e i(t t )Ĥ 0 Âe i(t t )Ĥ 0. As e i(t t 0 )H 0 is unitary, and it is well known in quantum mechanics that unitary transformations do not change expectation values. We confirm it here as Ψ 0 [ Â(t t 0 ), ˆB(t t 0 ) ] Ψ 0 = Ψ 0 e i(t t 0 )H 0 [Â(t t ), ˆB ] e i(t t 0 )H 0 Ψ 0 Thus linear response from Eq. (186) using Eq. (188) is The response function is defined as = Ψ 0 e i(t t 0 )E 0 [Â(t t ), ˆB ] e i(t t 0 )E 0 Ψ 0 = Ψ 0 [ Â(t t ), ˆB ] Ψ 0. (188) t A 1 (t) = i dt F(t ) Ψ 0 [ Â(t t ), ˆB ] Ψ 0 (189) t 0 χ AB (t t ) iθ(t t ) Ψ 0 [ Â(t t ), ˆB ] Ψ 0, (190) where Θ(t t ) is time step function which has value 1 for t t and zero otherwise.this ensures the causality. It allows us to replace upper limit by.thus linear response of Â in terms of the response function is A 1 (t) = dt F(t )χ AB (t t ) (191) where the lower limit has been extended to as field F(t) is zero for all values below t 0. Let us write Eq. (191)in Fourier space, 1 dω A 2π 1 (ω)e iωt = 1 (2π) 2 dt dωdω F(ω)χ AB (ω )e iωt e i(t t )ω = 1 (2π) 2 = 1 2π = 1 2π dt dωdω F(ω)χ AB (ω )e i(ω ω )t e iω t dωdω F(ω)χ AB (ω )δ(ω ω )e iω t dω F(ω)χ AB (ω)e iωt, (192) 116

117 where we used 2πδ(ω ω ) = dt e i(ω ω )t. The relation in Eq. (192) holds for any value of t. So, we can write A 1 (ω) = F(ω)χ AB (ω) (193) The Fourier transform of Eq. (190) gives frequency dependent response. χ AB (ω) = dτ χ AB (τ)e iωτ = i 1 = lim η 0 + 2π dτ Θ(τ) Ψ 0 [ Â(τ), ˆB ] Ψ 0 e iωτ dτdω e iω τ ω + iη Ψ 0 [ Â(τ), ˆB ] Ψ 0 e iωτ, (194) where we have used the integral representation of time step function Θ(τ) = lim η 0 + 2π i dω e iω τ ω +iη. Now using 1 = j=0 Ψ j Ψ j we get χ AB (ω) = lim η 0 + j=0 1 dτdω Ψ0 Â Ψ j Ψ j ˆB Ψ 0 2π ω e i(ω ω+ω j )τ + iη lim η 0 + j=0 1 2π where Ω j = E j E 0. Now using the standard integral 1 2π we can write Eq. (195) as dτdω e i(ω ω+ω j )τ ω + iη χ AB (ω) = lim η 0 + = Ψ 0 Â Ψ j Ψ j ˆB Ψ 0 ω Ω j + iη j=1 dτdω Ψ 0 ˆB Ψ j Ψ j Â Ψ 0 ω e i(ω ω Ω j )τ, (195) + iη dω δ(ω ω + Ω n ) ω + iη = 1 ω Ω j + iη, (196) Ψ 0 ˆB Ψ j Ψ j Â Ψ 0 ω + Ω j + iη, (197) where the j = 0 terms cancel out in the first and second terms.this is the so-called Lehman representation of response function. 1. Density-density response function Consider the perturbation coupled to electron density as Ĥ 1 (t) = d 3 r v 1 (r,t) ˆn(r ), (198) where v 1 (r,t) is the fluctuation in the external potential. The corresponding change in the density is n 1 (r,t) = dt d 3 r χ nn (r,r,t t )v 1 (r,t ), (199) 117

118 where χ nn (r,r,t t ) is the density-density response function which can be written as χ nn (r,r,t t ) = i Θ(t t ) Ψ 0 [ ˆn(r,t t ), ˆn(r )] Ψ 0. (200) In frequency space we can write n 1 (r,ω) = d 3 r χ nn (r,r,ω)v 1 (r,ω), (201) where χ nn (r,r,ω) is the density-density response function in the frequency space. A fluctuation in the Kohn-sham potential v s (r,ω) can induce a change in density n 1 (r,ω) = d 3 r χ s (r,r,ω)v s 1 (r,ω), (202) where χ s (r,r,ω) is the Kohn-Sham response function. The Kohn-Sham potential is given as v s (r,ω) = v H (r,ω) + v xc (r,ω) + v ext (r,ω) v s 1 (r,ω) = v H1 (r,ω) + v xc1 (r,ω) + v ext1 (r,ω) (203) = d 3 r δv H[n](r,ω) δn(r n,ω) 1 (r,ω) + d 3 r δv xc[n](r,ω) δn(r n,ω) 1 (r,ω) + v 1 (r,ω) n0 n0 = d 3 r n 1(r,ω) r r + d 3 r f xc (r,r,ω)n 1 (r,ω) + v 1 (r,ω) = d 3 r [w(r,r ) + f xc (r,r,ω)]n 1 (r,ω) + v 1 (r,ω) = d 3 r f Hxc (r,r,ω)n 1 (r,ω) + v 1 (r,ω) (204) where w(r,r ) = 1 r r, f xc(r,r,ω) = δv xc[n](r,ω) δn(r,ω) n0 is the so-called exchange-correlation kernel and f Hxc (r,r,ω) = w(r,r ) + f xc (r,r,ω) is the so-called Hartree-xc kernel. The KS response function χ s relates the change of density n 1 and change of KS potential given in Eq. (204) as n 1 (r 1,ω) = d 3 r χ s (r,r,ω)v s 1 (r,ω) = d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)n 1 (r,ω) + d 3 r χ s (r,r,ω)v 1 (r,ω). (205) Consider d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)n 1 (r,ω) = d 3 r d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)χ(r,r,ω)v 1 (r,ω) (206) 118

119 Using Eq. (206) and Eq. (201) in Eq. (205) we can write d 3 r χ(r,r,ω)v 1 (r,ω) = d 3 r χ s (r,r,ω)v 1 (r,ω) + d 3 r d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)χ(r,r,ω)v 1 (r,ω) (207) Since, this relationship is valid for any arbitrary perturbation v 1, hence, one can write χ(r,r,ω) = χ s (r,r,ω) + d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)χ(r,r,ω) = χ s (r,r,ω) + d 3 r d 3 r χ s (r,r,ω)[w(r,r ) + f xc (r,r,ω)]χ(r,r,ω) (208) This equation is the so-called Dyson screening equation. The iterative formal solution is possible if on knows the exchange-correlation kernel f xc. If we set f xc = 0 we end up with the random-phase approximation (RPA). 2. Calculation of properties from response functions The response function can be used to calculate polarizability of a system which is very important physical property. If a system is subject to an electric field E(t) = εsin(ωt)ê z then the 1st-order dipole polarization p 1 is given as p 1 (t) = dt (t t )E(t ), (209) p 1 (ω) = (ω)e(ω) (210) where is the dipole-dipole polarizability tensor. The perturbation as a consequence of E(t) is v 1 = zεsin(ωt) which in Fourier-space becomes v 1 (r,ω) = εz/2 which can polarize the system of electrons and change density n 1 (r,ω) = d 3 r χ(r,r,ω)v 1 (r,ω) (211) The z-component of the polarization would be p 1z = d 3 r zn 1 (r,ω) (212) = d 3 rd 3 r z χ(r,r,ω)v 1 (r,ω) (213) = ε d 3 rd 3 r z χ(r,r,ω)z 2 (214) Comparison of Eqs. 210 and 214 leads to the conclusion that the dipole-dipole polarizability α zz is given as α zz (ω) = d 3 rd 3 r z χ(r,r,ω)z. (215) 119

120 Now using the Lehman representation of the response function χ and using the fact d 3 rd 3 r z z Ψ 0 ˆn(r) Ψ j Ψ j ˆn(r ) Ψ 0 = Ψ 0 z k Ψ j Ψ j z l Ψ 0 k = Ψ 0 z k Ψ j Ψ j z l Ψ 0 k l l = Ψ0 Z Ψ j 2, (216) where Z = i z i, the polarizability can be written as α zz (ω) = lim η 0 + = lim η ω Ω n + iη 1 ω + Ω j + iη Ψ 0 Z Ψ j 2 2Ω j (ω + iη) 2 Ω 2 Ψ 0 Z Ψ j 2 n j=1 j=1 (217) The polarizability generally would be defined as α ab = d 3 rd 3 r r a r b χ(r,r,ω) (218) = lim η 0 + 2Ω N N n (ω + iη) 2 Ω 2 Ψ 0 r al Ψ j Ψ j r b m Ψ 0 (219) j j=1 l m B. Linear response in CC approach 1. CC equations The Schrödinger equation for coupled cluster system is already given by equation (165) and the corresponding wavefunction is given by equation (166). Ψ CCD = e ˆT 0 (220) where ˆT is the excitation operator and 0 > is the Fermi vacuum state. For normal ordered Hamiltonian, the time independent Schrödinger equation with CC method is given as Ĥ N Ψ CCD = E Ψ CCD (221) (Ĥ N Ψ CCD E Ψ CCD ) = 0 (Ĥ N E) Ψ CCD = 0 120

121 Here E is the energy with respect to vacuum reference state. Applying e ˆT to the left of the above equation yields ( e ˆT Ĥ N e ˆT E ) 0 = 0 which we can write as (Ĥ E) 0 > = 0 where Ĥ = ( e ˆT Ĥ N e ˆT ). Here we can see that Ĥ which is also called the CC effective Hamiltonian/similarity transformed CC Hamiltonian, is non-hermitian and is a symmetric transformation of normal ordered Hamiltonian. Also, E is the energy corresponding to the state vector 0 >. Now let us take a closer look to the CC effective Hamiltonian Ĥ = e ˆT Ĥ N e ˆT. Using Baker-Campbell-Housdorff expansion one can write that e ˆB Âe ˆB = Â + [Â, ˆB] + 1 2! [[Â, ˆB], ˆB] + 1 3! [[[Â, ˆB], ˆB], ˆB] +... (222) The transformed CC Hamiltonian thus becomes e ˆT Ĥ N e ˆT = Ĥ N +[Ĥ N, ˆT ]+ 1 2! [[Ĥ N, ˆT ], ˆT ]+ 1 3! [[[Ĥ N, ˆT ], ˆT ], ˆT ]+ 1 4! [[[[Ĥ N, ˆT ], ˆT ], ˆT ], ˆT ] (223) Here the series terminate with four fold term as the Hamiltonian has atmost twoparticle interactions. Now ˆT has only particle creation and hole annihilation operators and only possible non-zero contractions are AB = δ ab i j = δ ij which shows that [ ˆT m, ˆT n ] = 0. Therefore the only non-zero terms will be commutation between Ĥ N and ˆT. Since only non-zero contractions are particle creation with particle annihilation on it s left and hole annihilation with hole creation on its left, the only nonzero terms will be H N on the left. Ĥ therefore becomes Ĥ = Ĥ N + Ĥ N ˆT + H N T T +... = (H 2! N e ˆT ) C Here C denotes that only fully connected terms are included. Now we can replace the term in Schrödinger equation for CC system. (Ĥ N e ˆT 0 ) CCD = E 0 (224) 121

122 Now let us define the ground state and excited state projecton operators P, Q as P = 0 0 (225) which has the properties : Q = I P (226) P 2 = P (227) Q 2 = (1 P ) 2 = 1 P P + P 2 = Q (228) Therefore using projection operators equation (8) can be written as ( 0 Ĥ 0 ) = E (229) and i.e. ( φ ab... ij... Ĥ 0 ) = 0 (230) P HP = EP (231) and QHP = 0 (232) These are called the CC amplitude equations. We are going to use these equations and Hellmann Feynman theorem to derive CC energy functional using linear response theory. 2. Hellmann-Feynman theorem As we already know by now, response theory is an alternative of studying properties of molecular system in presence of perturbation ( i.e. external electric or magnetic field or displacement between nuclei etc. ). Using Hellmann-Feynman theorem, we can study first or higher order properties even if the wavefunction is not know. The theorem states that the expectation value of first and higher order property is equivalent to the energy derivative with respect to applied perturbation at the point when perturbation is equal to zero. Let us try to prove the theorem. Since we are considering a system in a perturbed field, we can write Ĥ = Ĥ(λ) and a Maclaurin series expansion gives 122

123 Ĥ(λ) = Ĥ(0) + λĥ 1 + λ 2 Ĥ (233) where Ĥ (n) = 1 n! dn Ĥ dλ n λ=0 Here Ĥ(0) is the unperturbed Hamiltonian of the system and λ is the perturbation parameter. For linear perturbation all higher order terms with n > 1 will be zero and the Hamiltonian will be Similarly, the energy and the wavefunctions will be Ĥ(λ) = Ĥ(0) + λĥ 1 (234) and E(λ) = E (0) + λe (1) +... (235) Now Schrödinger equation gives Ψ (λ) = Ψ (0) + λψ (1) +... (236) Ĥ(λ)Ψ (λ) = E(λ)Ψ (λ) (237) (Ĥ(0) + λĥ (1) )(Ψ (0) + λψ (1) +...) = (E (0) + λe (1) +...)(Ψ (0) + λψ (1) +...) Ψ (0) Ĥ (1) Ψ (0) > Ψ (0) Ψ (0) > = E 1 = de(λ) dλ λ=0 (238) which is a trivial case of Hellmann-Feynman theorem i.e. first order property of a system can be studied as derivative of the perturbed energy at λ = 0 i.e. at zero perturbation. We can take an example of a system in an external electric field, E ( perturbation in this case). The energy of the system is given by E( E) = E q u r u (239) It is quite conspicuous from the energy equation that first order property i.e. d E( E) de a E=0 is nothing but the dipole moment of the system. The higher order properties can be obtained following same method. 123

124 3. Linear response CC for static perturbation Now that we have all required equations, we can derive energy functional for CC. The amplitude equation for CC is given by equation (234), (235) and P HP = EP QHP = 0 Taking derivative of equation (234) with repect to λ gives Now Ĥ = ( e ˆT Ĥ N e ˆT ) d E dλ ˆP = ˆP d dλĥ ˆP (240) d = e ˆT d ˆT dλĥ λ=0 dλ ĤN e ˆT + e ˆT Ĥ N e ˆT d ˆT dλ + e ˆT dĥ N dλ e ˆT where Ĥ [λ] = e ˆT dĥ N dλ e ˆT = [Ĥ, ˆT λ ] + Ĥ [λ] and ˆT λ = d ˆT λ=0 dλ λ=0 On inserting this expression into above energy derivative gives, ˆP [Ĥ, ˆT λ ] ˆP = ˆP Ĥ( ˆP + ˆQ) ˆT λ ˆP ˆP ˆT λˆ( ˆP + ˆQ)ĤP = E ˆP ˆT λ ˆP + ˆP Ĥ ˆQ ˆT λ ˆP E ˆP ˆT λ ˆP = ˆP Ĥ ˆQ ˆT λ ˆP This expression can be simplified by using CC amplitude equation as follows : ˆQ Ĥ ˆP = 0 124

125 i.e. ˆQ dĥ dλ ˆP = 0 ˆQ{Ĥ [λ] + [Ĥ, ˆT λ ]} ˆP = 0 Now, the second term gives ˆQ[Ĥ, ˆT λ ] ˆP = ˆQĤ( ˆP + ˆQ) ˆT λ ˆP ˆQ ˆT λ ( ˆP + ˆQ)H ˆP = ˆQ( ˆQĤ ˆQ E) ˆQ ˆT λ ˆP Therefore using the fact that ˆ Q 2 = ˆQ, we get ˆQ ˆT λ ˆP = ˆQ( E ˆQĤ ˆQ) 1 ˆQĤ [λ] ˆP and equation(236) can be writen as where E λ = d E dλ λ=0 E λ ˆP = ˆP Ĥ [λ] ˆP + ˆP Ĥ ˆQ( E ˆQĤ ˆQ) 1 ˆQĤ [λ] ˆP (241) In the above equation, ˆT λ has been eliminated. We can now define effective resolvent operator R(λ) = ˆQ[ E(λ) ˆQ H(λ) ˆ ˆQ] 1 ˆQ The above equation then becomes E λ ˆP = ˆP Ĥ [λ] ˆP + ˆP ĤR ˆQĤ [λ] ˆP (242) To enhance the fact that this equation is valid at λ = 0, let us write the above equation in following manner : E (1) ˆP = ˆP Ĥ [λ] (0) ˆP + ˆP Ĥ(0)R(0) ˆQĤ [λ] (0) ˆP Here we can see that ˆP H(0) ˆ ˆR(0) ˆQ is independent of λ (perturbation) at λ = 0 and does not contain the perturbation operator and therefore we define a new operator Λ as Λ(0) = ˆP H(zero) ˆ ˆR(0) ˆQ and therefore we get E λ ˆP = ˆP (1 + Λ)Ĥ [λ] ˆP (243) 125

126 which when integrated over λ gives E ˆP = ˆP (1 + Λ)Ĥ ˆP (244) This is called fundamental CC energy functional which is independent of perturbation and if it is solved at λ = 0, Λ needs to be solved only once. The Λ operator satisfies few linear equations under stationary conditions which is going to be our next section. 4. Lambda equations To exploit stationary requirements of a functional we are going to set the coefficients of dependent functions equal to zero so that all higher derivatives vanish and henceforth Helmann-Feynman theorem is still valid. We start with a functional defined as E(Λ, ˆT ) = ˆP E(Λ, ˆT ) ˆP = ˆP (1 + Λ)Ĥ ˆP where Ĥ = e ˆT Ĥ N e ˆT and Λ is a dexcitation operator and ˆT is an excitation operator satisfying satisfying equations Λ ˆP = 0, Λ ˆQ = Λ, ˆP ˆT = 0, ˆQ ˆT = ˆQ Now variation of the functional with respect to its argument will be given by ˆP δe ˆP = ˆP δλ ˆQĤ ˆP + ˆP (1 + Λ ˆQ)δ(e ˆT Ĥ N e ˆT ) ˆP = 0 + ˆP (1 + Λ ˆQ)[Ĥ,δ ˆT ] ˆP (245) Since the functional is stationary with respect to its argument, we can set the coefficients of δλ and δ ˆT is zero satisfying CC amplitude equations. We can simplify the second term using CC amplitude equations ˆP Ĥ ˆP = E ˆP and ˆQĤ ˆP = 0 i.e. ˆP (1 + Λ ˆQ)[Ĥ,δ ˆT ] ˆP = ˆP (Ĥ + Λ ˆQĤ EΛ) ˆQδ ˆT ˆP The stationary condition of the functional with respect to ˆT gives ˆP (Ĥ + Λ ˆQĤ EΛ) ˆQ = 0 To evaluate Λ, we can either use inversion of the operator from equation (32) which is a tedius precedure or we can exploit the stationary condition of the functional and get some linear equations that Λ satisfies. 126

127 Starting from above equation we get, ˆP (1 + Λ ˆQ)(Ĥ E) = 0 Here an extra term E ˆP ˆQ (which is equal to zero since ˆP, ˆQ are orthonormal operators) has been added to derive the above steps. Now projection of ˆP will produce CC energy functional again but projecting ˆQ to the right gives the ˆΛ equations. ˆP (1 + Λ ˆQ)(Ĥ E) ˆQ = 0 ˆP Ĥ ˆQ + ˆP ΛĤ ˆQ E ˆP Λ ˆQ = 0 The equation has energy dependence which can produce some disconnected terms. The energy dependence can be eliminated the following way. ˆP ΛĤ ˆQ = ˆP [Λ,Ĥ] ˆQ + ˆP Ĥ( ˆP + ˆQ)Λ ˆQ = ˆP (ΛĤ) C ˆQ + E ˆP Λ ˆQ + ˆP Ĥ ˆQΛ ˆQ Therefore the energy equation after eliminating energy dependence is given as ˆP Ĥ ˆQ + ˆP (ΛĤ) C ˆQ + ˆP Ĥ ˆQΛ ˆQ = 0 (246) or, ˆP (Ĥ N e ˆT ) C ˆQ + ˆP (Λ(Ĥ N e ˆT ) C ) C ˆQ + ˆP (Ĥ N e ˆT ) C ˆQΛ ˆQ = 0 (247) where the last term is discunnected. For an arbitrary excited state, the above equation can be written in explicit form as < 0 Ĥ N e ˆT φ ab... ij... + < 0 Λ(Ĥ N e ˆT ) C φ ab... ij... C (248) + k<l<... c<d<... < 0 Ĥ N e ˆT φ cd... kl... C φ cd... kl... Λ φab... ij... = 0 be written as These are the Λ equations. For single excited state the equation can 127

128 < 0 Ĥ N e ˆT φ a i + < 0 Λ(Ĥ N e ˆT ) C φ a i C (249) where the disconnected term doesn t contribute to the Λ equation since there is no intermediate state between vacuum and singly excited state. For doubly excited state, Λ equation looks like < 0 Ĥ N e ˆT φ ab ij + < 0 Λ(Ĥ N e ˆT ) C φ ab ij C (250) + k=ij c=a,b < 0 Ĥ N e ˆT φ c k C φ c k Λ φab ij = 0 Now once we know these linear equations, we can solve for Λ. XI. TREATMENT OF EXCITED STATES A. Excitation energies from TD-DFT The poles of density-density response function give exact excitation energies. A finite response can be sustained by a system at its excitation frequencies even in the absence of any external perturbation. Therefore, setting v 1 = 0 in Eq. (205) one can write n 1 (r,ω) = d 3 r d 3 r χ s (r,r,ω)f Hxc (r,r,ω)n 1 (r,ω) (251) In the spin dependent formalism n 1σ (r,ω) = d 3 r d 3 r χ s,σσ (r,r,ω)f Hxc,σ σ (r,r,ω)n 1σ (r,ω) (252) σ σ If we pre-multiply with d 3 r f Hxc,σσ (r,r,ω) and using the notation g σσ (r,ω) = d 3 r f Hxc,σσ (r,r,ω)n 1σ (r,ω), (253) we can write Eq. 252 as g σσ (r,ω) = d 3 r d 3 r f Hxc,σσ (r,r,ω)χ s,σ σ (r,r,ω)g σ σ (r,ω) (254) σ σ The Kohn-Sham response function can be written as χ s,σσ (r,r,ω) = δ σσ jk α jkσ Φ jkσ (r)φ jkσ (r ) Ω ω jkσ + iη, (255) 128

129 where α jkσ = f kσ f jσ (256) Φ jkσ (r) = φ jσ (r)φ kσ (r) (257) ω jkσ = ε jσ ε kσ (258) g σσ (r,ω) = = jkσ σ jkσ δ σ σ α jkσ Ω ω jkσ d 3 r d 3 r f Hxc,σσ (r,r,ω)φ jkσ (r )Φ jkσ (r )g σ σ (r,ω) (259) α jkσ d 3 r f Ω ω Hxc,σσ (r,r,ω)φjkσ jkσ (r ) d 3 r Φ jkσ (r )g σ σ (r,ω) } {{ }} {{ } (260) Now multiplying Eq. 260 by σ d 3 rφ jk σ (r) and using H jkσ (Ω) = d 3 r Φ jkσ (r)g σσ (r,ω), (261) σ K jkσ,j k σ = d 3 rd 3 r Φ jkσ (r)f Hxc,σσ (r,r,ω)φ j k σ (r ), (262) we can write α j H jkσ (Ω) = k σ d 3 rd 3 r Φ Ω ω j k σ j k σ jkσ (r)f Hxc,σσ (r,r,ω)φj k σ (r ) H j k σ (263) } {{ } α j H jkσ (Ω) = k σ K Ω ω jkσ,j k σ H j k σ (264) j k σ j k σ H jkσ (Ω) = α j k σ K jkσ,j k σ β j k σ (Ω) (265) j k σ β j k σ (Ω) = H j k σ Ω ω j k σ (266) Now multiplying and dividing Eq. 265 by Ω ω jkσ we can write (Ω ω jkσ )β jkσ = α j k σ K jkσ,j k σ β j k σ (Ω), (267) j k σ ω jkσ β jkσ (Ω) + α j k σ K jkσ,j k σ β j k σ (Ω) = Ωβ jkσ (Ω) (268) j k σ [ δj,j δ kk δ σσ ω j k σ + α j k σ K ] jkσ,j k σ βj k σ (Ω) = Ωβ jkσ (Ω) (269) j k σ 129

130 Since indices j(j ) and k(k ) can take values such that if one runs over occupied orbitals then the other has to run over virtual orbitals. We will be denoting occupied orbitals by i(i ) and virtual orbitals by a(a ). Therefore, Eq. 269 can be written as [ δij δ ak δ σσ ω j k σ + α j k σ K ] jkσ,j k σ βj k σ (Ω) = Ωβ iaσ (Ω) (270) j k σ [ δaj δ ik δ σσ ω j k σ + α j k σ K ] jkσ,j k σ βj k σ (Ω) = Ωβ aiσ (Ω) (271) j k σ [ δii δ aa δ σσ ω i a σ + α i a σ K ] iaσ,i a σ βi a σ i a σ [ + δia δ ai δ σσ ω a i σ + α a i σ K ] iaσ,i a σ βa i σ i a σ [ = (δii δ aa δ σσ ω i a σ K iaσ,i a σ )β i a σ + K iaσ,i a σ β ] a i σ i a σ (272) Therefore Eqs. 271 and 270 can be written as i a σ [ (δii δ aa δ σσ ω i a σ K iaσ,i a σ )β i a σ + K iaσ,i a σ β a i σ ] = Ωβiaσ (273) i a σ [ Kaiσ,i a σ β i a σ + (δ aa δ ii δ σσ ω a i σ K iaσ,a i σ )β a i σ ] = Ωβaiσ (274) Now defining X iaσ = β iaσ and Y iaσ = β aiσ and using the fact that ω i a σ = ω a i σ we can rewrite Eqs. 273 and 274 as i a σ [ (δii δ aa δ σσ ω a i σ + K iaσ,i a σ )X i a σ + K iaσ,i a σ Y a i σ ] = ΩXiaσ (275) i a σ [ Kaiσ,i a σ X i a σ + (δ aa δ ii δ σσ ω a i σ + K iaσ,a i σ )Y a i σ ] = ΩYaiσ (276) If Kohn-Sham orbitals are real then from Eq. 262 K iaσ,i a σ = K aiσ,a i σ. Therefore, we can write ( A B B A )( ) X = Ω Y ( )( ) X Y where matrices A and B are called Hessians and have elements (277) A iaσ,i a σ (Ω) = δ ii δ aa δ σσ ω a i σ + K iaσ,i a σ (Ω) (278) B iaσ,i a σ (Ω) = K iaσ,i a σ (279) The Eq.277 is known as Casida Equation which can in principle be solved to get the exact excitation energies Ω n. The solution of Eq. 277 also gives Ω n which is deexcitation energy. The excitation energy energy Ω n may represent single as well as multiple 130

131 excitations.one needs the exact Kohn-Sham orbitals and corresponding energy eigenvalues along with the exchange correlation kernel f xc which is unknown in general, moreover, it will be an infinite dimensional problem and in practice one needs to approximate f xc and solve it for finite dimensions. One such approximation is Tamm-Dancoff approximation which ignores deexcitation processes. Another approximation called small matrix approximation (SMA) in which off-diagonal elements of the matrices A and B are neglected which may work well in special conditions. B. Limitations of single-reference CC metods The conventional, single-reference, coupled-cluster method is very effective for electronic states domminated by a single determinant, such as most molecular ground states near their equilibrium geometry. Such stateds are predominantly closed-shell singlet states, and CC calculations on them produce pure singlet wave functions. But even these states become dominated by more than one determinant when one or more bonds are stretched close to breaking, besides, most excited, ionized and electron-attached states are open-shell states, so that single-reference CC based on RHF orbitals is then not usually appropriate for the calculation of entire potential- energy surfaces. One solution to these problems is to resort to multireference methods. An effective alternative in many cases is provided by the equation-of-motion coupled-cluster (EOM-CC) method. C. The equation-of-motion coupled-cluster method The basic idea of EOM-CC is to start with a conventional CC calculation on some initial state, usually a vonveniently chosen closed-shell state, and obtain the desired target state by application of a CI-like linear operator acting on the initial state CC wave function. Althogh the calculations for the the two states must use the same set of nuclei in the same geometrical arrangement and the same set of spinorbitals defining a common Fermi state 0, they need not have the same number of electrons. In the EOM-CC method we consider two Schrödinger-equation eigenstates simultaneously, an initial state Ψ 0 and a target state Ψ k, ĤΨ 0 = E 0 Ψ 0,ĤΨ k = E k Ψ k. (280) The initial state is often referred as the reference state. The aim of the method is to determine the energy difference ω k = E k E 0 (281) If we use the normal-product form of the Hamiltonion, equations (280) become 131

132 Ĥ N Ψ 0 = E 0 Ψ 0 (282) Ĥ N Ψ k = E k Ψ k (283) where E 0 = E 0 E ref and E k E ref, with E ref = 0 Ĥ 0. Then we have ω k = E k E 0 (284) The initial-state coupled-cluster wave function is represented by the action of an exponential wave operator Ω 0 = e ˆT on a single-determinant reference function 0, Ψ 0 = Ω 0 0 = e ˆT 0 (285) An operator ˆR k is used to generate the target state from the initial state, Ψ k = ˆR k Ψ 0 (286) so that, using 285, the target-state Schrödinger equation refcl can be written in the form Ĥ N ˆR k e ˆT 0 = E k ˆR k e ˆT 0 (287) In the EOM-CC case, if all possible excitations from the initial state are included we have ˆR k = r 0 + ri a {â î} +... (288) i,a Since ˆR k is an excitation operator, it commutes with the CC cluster operator ˆT and all its components. Multiplying (287) on the left with e ˆT and using the commutation between ˆR k and ˆT and using the commuation between ˆR k and ˆT, we get H ˆR k 0 = k ˆR k 0 (289) where H = e ˆT Ĥ N e ˆT showing that ˆR k 0 is a right eighenfuction of H with eigenvalue E k. And it has left eigenfunctions 0 ˆL k, with the same eigenvalues E k as the corresponding right eigenfunctions ˆR k 0, satisfying The operator ˆL k is a de-excitation operator. 0 ˆL k H = 0 ˆL k E k (290) 132

133 and therefore safisfies ˆL k = l 0 + l a{î i â} +... (291) i,a ˆL k ˆP = 0, ˆL k = ˆL k ˆQ (292) For the initial state (k=0) we have ˆR 0 = ˆ1, but ˆL 0 ˆ1 The two sets of eigenfunctions are biorthogonal and can be normalized to satisfy 0 ˆL k ˆR l 0 = δ kl (293) They provide a resolution of the identity, ˆ1 = ˆR k 0 0 ˆL k (294) Also, because ˆR 0 = ˆ1 we have k Since ˆR 0 = 1, the initial-state version of (289) is 0 ˆL k 0 = δ k0 (295) H 0 = E 0 0 (296) Multiplying this equation on the left by ˆR k and substracting it from (289), we obtain the EOM-CC equation in the form [H, ˆR k ] 0 = ( E k E 0 ) ˆR k 0 (297) or (H ˆR k 0 ) C = ω k ˆR k 0 (298) D. Multireference coupled-cluster methods As in the case of quasidegenerate perturbation theory, multireference coupled-cluster (MRCC) theory is designed to deal with electronic states for which a zero-order description in terms of a single Slater determinant does not provide an adequate starting point for calculating the electron correlation effects. All multireference methods are based on the generalized Bloch equation (Lindrgen 1974) [Ω,Ĥ 0 ] ˆP = ˆV Ω ˆP Ω ˆP ˆV Ω ˆP (299) 133

134 The projection operator ˆP projects onto a model space spanned by a set of model functions Φ α, ˆP = Φ α Φ α = ˆP α, ˆP α = Φ α Φ α (300) α and Ω = Ω ˆP is the wave operator, which, when operating on the model space, produces the space spanned by the perturbated wave functions, α Ψ α = ΩΦ α (301) By rearranging the terms in (299), and using Ĥ 0 + ˆV = Ĥ, at the same time noting that ˆP Ω = ˆP, the result is or ĤΩ = Ω(Ĥ 0 ˆP + ˆV Ω) = Ω(Ĥ 0 + ˆV )Ω (302) ĤΩ = ΩĤΩ (303) The functions Ψ α, are not individually eigenfunctions of Ĥ but span the space of eigenfunctions Ψ α for which the model space forms a zero-order approximation, Ĥ Ψ α = E α Ψα (304) Applying ˆP from the left, we get the matrix eigenvalue equation The operator ˆP ĤΩ Ψ α = E α Ψα (305) Ĥ ef f = ˆP ĤΩ (306) which operators entirely in ˆP -space and whose eigenfunctions and eigenvalues are Φ α and E α, respectively, is called the effective Hamiltonion operator. With this notation, the generalized Bloch equation (303) can be written in the form ĤΩ = ΩĤ ef f (307) In the Hilbert-space approach to MRCC theory assumes a separate Fermi-vacuum definition, and thus a separate partition of the spinorbitals into hole and particle states, for which model-space dererminant. The wave operator is separated into individual wave operators for the different model states, 134

135 Ω = Ω α = e ˆT α ˆP α (308) α Substituting the definition of the wave operator (308), the generalized Bloch equation (307) may be written in the form α Ĥe ˆT β ˆP β = e ˆT β ˆP β Ĥ ef f ˆP (309) β Projection on the left with e ˆT α and on the right with ˆP α we obtain β e ˆT α Ĥe ˆT α ˆP α = e ˆT α e ˆT β ˆP β Ĥ ef f ˆP α (310) β Applying an external-space determinant Φ ab... ij... (α) = â î ˆb ĵ...φ α on the left and the model function Φ α on the right, we obtain equations for the external-excitation emplitudes t ab... ij... (α) contained in the operators ˆT α Φ ab... ij... (α) e ˆT α Ĥe ˆT α Φ α = β Φ ab... ij... (α) e ˆT α e ˆT β Φ β Φ β Ĥ ef f Φ α (311) The matrix elements of the effective Hamiltonian Ĥ ef f appearing in this equation are obtained, using (306) and (308), as H ef f βα = Φ β Ĥ ef f Φ α = Φ β ĤΩ Φ α = Φ β Ĥe ˆT α Φ α (312) We can use the CC effective Hamiltonian for model Φ α, H α = e ˆT α Ĥe ˆT α (313) Then the equations for the external-excitation amplitudes take the form Φ ab... ij... (α) Hα Φ α = To evaluate the matrix element in (312), we note that and therefore We insert e ˆT α e ˆT α = 1 and obtain β Φij... ab... (α) e ˆT α e ˆT β Φ β H ef f βα (314) Φ α e ˆT α = Φ α (1 ˆT α +...) = Φ α (315) H ef f αα = Φ α e ˆT α Ĥe ˆT α Φ α = Φ α H α Φ α (316) H ef f βα = Φ β e ˆT α H α Φ α (317) 135

136 Next, we consider the first factor in the sum on the r.h.s. of(314) Sij... ab... (αβ) = Φab... ij... (α) e ˆT α e ˆT β Φ β = Φij... ab... (α) e ˆT α e ˆT β Φuv...(α) xy... (318) Insering a resolution of the identity between the two exponentials, we obtain Sij... ab... (αβ) = Φij... ab... (α) e ˆT Φ I Φ I e ˆT β Φ xy... I uv...(α) (319) The series expansions of the exponentials in this equation result in expressions, involving CI-like amplitudes, corresponding to linear combinations of ˆT amplitudes and their products. Combing the matrix H ef f with the CI-like amplitudes, an eigenvalue function of the H ef f can be derived. The diagrammatic representation and the evaluation of this expansion are decribed in detail by Paldus, Li and Petraco (2004). Several applications of Hilbert-space SU-MRCCSD were discussed by Li and Paldus (2003c, 2004), who compared model spaces of different dimensions ith high-excitation singlereference CI. XII. INTERMOLECULAR INTERACTIONS Intermolecular interactions (forces) determine the structure and properties of clusters, nanostructures, and condensed phases including biosystems. The interaction energy of a cluster of N atoms or molecules (called monomers) with n electrons is defined in the following way. The time-independent Schrödinger equation in Born-Oppenheimer approximation can be written as Ĥ(r 1,...,r n ;Q 1,...,Q N )Ψ (x 1,...,x n ;Q 1,...,Q N ) = E tot (Q 1,...,Q N )Ψ (x 1,...,x n ;Q 1,...,Q N ) with the usual notation for the electron coordinates and with the variable Q i = (R i,ω i,ξ i ) denoting the set of coordinates needed to specify the geometry of ith monomer: the position of the center of mass, R i, set of three Euler angles ω i defining the orientation of the monomer, and a set of internal monomer coordinates, ξ i. The interaction energy is the defined as the difference between this quantity and the sum of monomer energies E i (ξ i ) E int (Q 1,...,Q N ) = E tot (Q 1,...,Q N ) E i (ξ i ). Interaction energies defined in this way are sometimes called vertical" interaction energies since the geometry of each monomer is the same as its geometry in the dimer. The interaction energy of an N-mer can be represented in the form of the following many-body expansion (assuming rigid monomers) E int [N] = E int [2,N] + E int [3,N] E int [N,N], 136 i

137 where the term E int [k,n] is called the k-body contribution to the N-mer energy. The twobody contribution is just the sum of interaction energies of all isolated monomer pairs, i.e., all dimers E int [2,N] = E int (Q i,q j )[2,2]. Analogously, the three-body contribution is E int [3,N] = E int (Q i,q j,q k )[3,3]. i<j i<j<k This definition applied to a trimer shows that the trimer three-body energy is E int (Q 1,Q 2,Q 3 )[3,3] = E int [N] 3 E int (Q i,q j )[2,2] i<j The higher-rank terms are defined in an analogous way. Any of the electronic structure methods can be used to compute interaction energies from the definition given above (so-called supermolecular approach). However, since interaction energies are usually more than an order of magnitude smaller in absolute value than chemical-bond energies and at least four orders of magnitude smaller in absolute value than the total electronic energies of atoms or molecules, the most natural method for investigating these phenomena is to start from isolated monomers and treat the interactions as small perturbations of this system. Such an approach is called symmetry-adapted perturbation theory (SAPT). An early version of SAPT was introduced already in 1930s by Eisenschitz and London. A generally applicable SAPT was developed in late 1970s and 1980s. A. Symmetry-adapted perturbation theory The simplest perturbation theory of intermolecular interactions is just the standard Rayleigh-Schrödinger (RS) perturbation theory discussed earlier. For a dimer, we partition the total Hamiltonian as Ĥ = Ĥ 0 + ˆV = Ĥ A + Ĥ B + ˆV where Ĥ X is the Hamiltonian of the isolated monomer X and ˆV is the intermonomer interaction potential Z α Z β ˆV = α A β B R αβ α A j B Z α r jα i A β B Z β r iβ + i A j B 1 r ij. The zeroth-order problem is (Ĥ 0 E 0 )Φ 0 = 0 137

138 where Φ 0 = Φ A Φ B and E 0 = E A + E B. We get the standard set of RS equations (Ĥ 0 E 0 )Φ (n) RS = ˆV Φ (n 1) RS + n k=1 E (n) RS = Φ 0 ˆV Φ (n 1) RS. E (k) RS Φ(n k) RS It can be shown (a homework problem) that the first-order energy E (1) RS = Φ 0 ˆV Φ 0 can be expressed as a Coulomb interaction of unperturbed charge densities of monomers, i.e., an electrostatic interaction. Therefore, this terms is usually called the electrostatic energy and denoted as E (1) elst. The second-order energy can be written in the form of the usual spectral expansion E (2) RS = k+l 0 Φ0 AΦB 0 V ΦA k ΦB l 2 E0 A + EB 0 EA k. EB l This energy consists of two physically distinct components, the induction energy E (2) ind = k 0 and the dispersion energy Φ A 0 ΦB 0 V ΦA k ΦB 0 2 E (2) E A 0 EA k + l 0 Φ A 0 ΦB 0 V ΦA 0 ΦB l 2 E B 0 EB l disp = Φ0 AΦB 0 V ΦA k ΦB l 2 E0 A + EB 0 EA k. EB l k 0 l 0 In the induction energy expression, one can integrate in the first (second) sum over the coordinates of system B (A), obtaining in this way the electrostatic potential of monomer B (A) acting on system A (B). Thus, the induction energy is the response of a monomer to the electrostatic field of the interacting partner. The dispersion energy term is a pure quantum effect resulting from the correlation of electronic positions in system A with those in system B. The RS approach uses wave functions that are not globaly antisymmetric, they are antisymmetric only with respect to exchanges of electrons within monomer. We often say that the RS theory violates the Pauli s exclusion principle. Despite of this, it was shown by performing numerical calculations for one- and two-electron monomers that the RS theory actually does converge to the correct ground-state energy. However, this is not true anymore if even one of the monomers includes three or more electrons. We will return to this subject later on. Even more serious problem is that the RS approach does not give the repulsive walls at short intermonomer distances, i.e., becomes 138

139 unphysical there. It is easiest to see this in interactions of rare-gas atoms where the electrostatic energy is very small (cf. classical electrostatic interactions of spherical charge distribution), so the interaction energy is dominated by the second-order term which we know is negative for the ground state. Despite problems at small separations, the RS method gives nearly exact energies at large separations, as will be discussed below, and is the basis for the multipole expansion of interaction energy, also discussed later. To solve this problem of the RS method, one has to antisymmetrize the wave functions. We cannot use anymore the RS approach since already Φ 0 has to be antisymmetrized, so that the zeroth-order equation does not hold. There are several ways of introducing antisymmetry constraint in a perturbative way, leading to the family of SAPT methods. One way to derive several variants of SAPT is to iterate the Bloch form of Schrödinger s equation where Ψ = Φ 0 + ˆR 0 ( Φ0 ˆV Ψ ˆV ) Ψ (320) ˆR 0 = m 0 Φ m Φ m E m E 0 (321) is the same resolvent operator as used before. To derive Bloch s equation, write Schrödinger s equation as (Ĥ0 + ˆV ) Ψ = (E 0 + E)Ψ or (Ĥ0 E 0 ) Ψ = ( E ˆV ) Ψ (322) and act from the left with ˆR 0. Since ˆR 0 (H 0 E 0 ) = 1 Φ 0 Φ 0 and assuming intermediate normalization raketφ 0 Ψ = 1, we get the Bloch equation where E = raketφ 0 ˆV Ψ from multiplication of Eq. (322) by Φ 0. The Bloch equation can be iterated starting from replacing Ψ by Φ 0. We get with Ψ n = Φ 0 + ˆR 0 (E n ˆV )Ψ n 1 E n = Φ 0 ˆV Ψ n 1. Note that n is not the order of perturbation theory here. This set of equations is equivalent to RS perturbation theory in the sense that consecutive iterations reproduce higher and higher orders of this theory. However, since AΨ ˆ = Ψ, where A ˆ is the antisymmetrizer defined earlier, we can insert Aˆ in front of Ψ on the right-hand side of Eq. (320). After iterating, one get the following set of equations Ψ n = Φ 0 + ˆR 0 ( Φ 0 ˆV GΨ n 1 ˆV ) ˆ F Ψ n 1 139

140 E n = Φ 0 ˆV Ĝ Ψ n 1 Φ 0 Ĝ Ψ n 1 (323) F ˆ, Ĝ and Ĝ can be A or 1. where following SAPT methods Fˆ Ĝ Ĝ Ψ 0 name Particular choices of these operators lead to the 1 1 Aˆ Φ 0 Symmetrized RS (SRS) Aˆ 1 1 AΦ ˆ 0 Jeziorski-Kolos (JK) Aˆ A ˆ A ˆ AΦ ˆ 0 Eisenschitz-London-Hirschfelder-van der Avoird (EL-HAV) One may think that the EL-HAV method, applying the antisymmetrizer in all possible places, should work best. This is not the case since at large intermolecular separations this method in low order is not compatible with the RS approach (we omit the proof), and, as mentioned above, this approach is very accurate at such separation. The reason is that the exchange effects in interaction energies, i.e., effect resulting from the global antisymmetrization, decay exponentially with increasing intermonomer separation R, whereas the total interaction energy, as it will be shown below, decays as inverse powers of R. One can see the former from the zeroth-order antisymmetrized wave function for two interacting hydrogen atoms s ˆ A [1s A (r 1 )1s B (r 2 )] = (1 ± ˆP 12 )1s A (r 1 )1s B (r 2 ) = 1s A (r 1 )1s B (r 2 ) ± 1s A (r 2 )1s B (r 1 ) where we use spin-free approach and therefore we consider two types of states which after multiplication by singlet and triplet spin functions will form antisymmetric wave functions. The first term in the last part of the equation written above is the Φ 0 of the RS theory and the second term is the exchange one. The latter term, when used in the bra of expression (323) will lead to integrals where electron 1 is on center A in the bra and on center B in the ket, and similarly for electron 2. The effect is that all such integrals are proportional to two-center overlap integrals, and such integrals have to decay exponentially since wave functions decay exponentially. In contrast to EL-HAV, the SRS and JK methods have correct asymptotics. This correctness is evident for SRS since the SRS wave functions corrections are the same as those of RS. Moreover, the two latter methods are identical in the first two orders. In practice, modern SAPT implementations always use SRS due to its simplicity. The derivation presented above assumed that one knows exact wave functions of monomers. In practice, it is possible only for the smallest atoms. Thus, a generally applicable SAPT theory has to use methods analogous to the MBPT methods discussed earlier. The zeroth-order approximation that can be computed accurately for very large systems is the Hartree-Fock level. One then simultaneously accounts for the intramonomer correlation energies, e.g., at the MP2 level or at the CCSD level, and the SAPT expansion effects. 140

We conclude this section by examining the spectrum of Li H in order to understand why RS perturbation theory has to diverge for dimers containing one or two monomers with three or more electrons.

141 We conclude this section by examining the spectrum of Li H in order to understand why RS perturbation theory has to diverge for dimers containing one or two monomers with three or more electrons. The simplest example of such system is Li H. The left column of the figure shows the spectrum of the unperturbed system, i.e., the energies Ek Li + EH l. The right column shows the physical spectrum of LiH at the minimum separation of the dimer. The word physical indicates that the wave functions are completely antisymmetrized. The middle column shows the spectrum of LiH at the same R but in the space of functions that are antisymmetized only within Li. Since the Pauli exclusion principle does not apply to such states, one may have states well approximated by a wave function with three electrons occupying the 1s orbital: two electrons of Li and one electron coming from H. One can show (see a homework problem) that the lowest energy of such as system is much below the lowest physical energy and that the continuous spectrum starts below the lowest physical state. The physical state to appear in this spectrum by are submerged" in the unphysical (sometimes called Pauliforbidden) continuum. Since this means that the physical states are degenerate with this continuum, one cannot expect convergence. FIG. 8. Spectrum of Li H. B. Asymptotic expansion of interaction energy When the distance between monomers becomes large, one can expand the interaction potential ˆV in multipole series. This series is defined in most E&M textbooks. For the 141

142 electron repulsion term, we have 1 r 1 r 2 = l A,l B =0 l< m= l < l < = min(l A,l B ), where K is a combinatorial coefficients K m l A l B R l A+l B +1 Qm l A (r 1 )Ql m B (r 2 ), Kl m A l B = ( 1) l (l B A + l B )! [(l A + m)!(l A m)!(l B + m)!(l B m)!] 1/2 and the solid harmonics are expressed through the standard spherical harmonics Q m l (r) (called also 2 l th-pole moment operator) ( ) 4π 1/2 Ql m (r) = r l Y m 2l + 1 l (ˆr). One of homework problems shows that the first-order electrostatic energy can be written as E (1) elst = ρ A (r 1 )v(r 1,r 2 )ρ B (r 2 )d 3 r 1 d 3 r 2 where v(r 1,r 2 ) = 1 r 1 r 2 1 N A β Z β r 1 R β 1 N B α Z α r 2 R α + 1 N A N B α,β Z α Z β R α R β (324) with the sums running over the nuclei of system A and B and Z γ s denoting the nuclear charges. Let s apply the asymptotic expansion to the first term in this expression 1 ρ A (r 1 ) r 1 r 2 ρ B(r 2 )d 3 r 1 d 3 r 2 = l< Kl m A l B ρ A (r 1 )Ql m A (r 1 )d 3 r 1 ρ B (r 2 )Ql m B (r 2 )d 3 r 2 l A,l B =0 m= l < R l A+l B +1 The first (second) of the two integrals can be recognized as component of the multipole moment of monomer A (B) of rank l A (l B ). If the molecules are neutral and polar, the first nonvanishing moment is the dipole moment. For such systems the electrostatic energy decays as 1/R 3. Similar derivations can be performed for the remaining terms in Eq. (324) and in higher orders of RS perturbation theory. C. Intermolecular interactions in DFT Density functional theory (DFT) is the most often used method in computational studies of matter. In the standard Kohn-Sham (KS) implementation, all electron cor- 142

143 relation effects are included in the exchange-correlation energy. The exact form of this energy is unknown, and a large number of approximate functionals have been constructed to describe it, as discussed earlier. While such functionals describe many properties of matter quite accurately, there are also several properties where all existing functionals fail, and one such example are intermolecular interactions which involve atoms or molecules separated by several angstroms or more. The local density approximation (LDA) obviously misses any interactions between distant regions. The semilocal generalized gradient approximations (GGA s) still cannot describe long-range electron correlations due to the limited range of the exchange-correlation hole. The size of such a hole is of the order of 1 Å, so correlation interactions between regions separated by much more than this distance cannot be recovered. One can say that these methods are myopic with the range of vision of about 1 Å. Interaction energies given by most DFT methods can be brought to agreement with accurate interaction energies by adding a negative correction, which at very large R (for systems with no dipole and quadrupole moments) is simply the dispersion energy. For shorter R, the dispersion energy has to be tapered, differently for each DFT method. This observation led to a family of methods supplementing DFT interaction energies by a dispersion" correction (computed for example as an atom-atom function fitted to results of calculations with wave function methods and properly tapered) referred to as DFT+D type methods. The DFT+D methods are reasonably successful, reproducing complete interaction energy curves with errors of the order of a few percent, but this approach is not anymore a first-principles one. There were also several so-called nonlocal density functionals created. These are firstprinciples approaches but at the present time are less accurate than DFT+D methods. XIII. DIFFUSION MONTE CARLO The diffusion Monte Carlo method provides a different way, than what we have seen so far in this course, to solve time-dependent Schrodinger equation of a system. Let s consider a single particle m in a one-dimensional box. Transformation from real time to imaginary time can be done by making the following changes: and V (x) V (x). Using this transformation the Schrodinger equation reads: One can solve this equation as: τ = it (325) τ ψ = 2 /2m 2 xψ [V (x) E r ]ψ (326) ψ(x,τ) = c n φ n (x)exp[ (E n E R )τ/ ] (327) 143

144 where φ n (x) and E n are the eigenstates and eigenvalues of the time-independent Schrodinger equation, respectively. There are three possibilities for τ : (i) if E R > E 0 the wavefunction diverges exponentially fast. (ii) if E R < E 0 the wavefunction vanishes and (iii) if E R = E 0 we get ψ(x,τ) = c 0 φ 0 (x). This behavior provides the basis of the DMC method: for E R = E 0 the wavefunction ψ(x,τ) converges to the ground state φ 0 (x) regardless of the choice of initial wavefunction ψ(x,0) as long as there is an overlap between the initial wavefunction and the ground state, namely as long as c 0 0. Path integral method can be used to solve 327. Readers are referred to standard quantum mechanics text books to convince themselves that the following equation is true: N 1 ψ(x,τ) = lim dx j N j=0 N W (x n )P (x n,x n 1 )ψ(x 0,0) (328) where the probability density P and the weight function W can be obtained as: n=1 P (x n,x n 1 ) = m/2π τ exp[ m(x n x n 1 ) 2 /(2 τ)] (329) W (x n ) = exp[ (V (x n ) E R ) τ/ ] (330) with τ = τ/n. Note that P (x,y)dy = 1 and the exponential part of P is Gaussian probability for the random variable x n with mean x n 1 and variance σ = τ/m. Equation 328 should be solved numerically and one may use the so-called Monte Carlo method. In this method an N-dimensional integral I = N 1 dx j f (x 0,...,x N 1 )P (x 0,...,x N 1 ) (331) j=0 with P being probability density can be approximated as: x (i) j I = 1/N N i=1,x (i) P f (x (i) 0,...,x(i) N 1 ) (332) P means i = 1,2,...,N;j = 0,1,...,N 1 are selected randomly with probability density P. It is worth mentioning that the larger N the better approximation for I. While the Monte Carlo method is able to calculate ψ(x,τ), it is unable to find E 0 and φ 0 (x 0 ). An improvement over this method is called Diffusion Monte Carlo which will be explained here. The basic idea is to consider the wavefunction a probability density sampling the initial wavefunction,ψ(x 0,0), at N 0 points. In fact, this method generates N 0 Gaussian random walkers which evolve in time: x (i) n = x (i) n 1 + σρ(i) n (333) 144

145 and variance σ. ρ(i) n is a Gaussian random number in the interval [0,1] with mean being 0 and variance 1. It is obvious that this stochastic process looks exactly like Brownian diffusion process. The generated "random walkers" are called "particles" or "replicas" in the DMC method. Instead of tracing the motion of each particle, one follows the motion of whole ensemble of replicas. The integrand in 328 can be interpreted as: where x (i) n is generated by 329 with mean value x (i) n 1 W (x n )P (x n,x n 1 )...W (x 2 )P (x 2,x 1 )W (x 1 )P (x 1,x 0 )ψ(x 0,0) (334) where ψ(x 0,0), P (x 1,x 0 ), W (x 1 ),..., P (x n,x n 1 ) and W (x n ) are process 0, process 1, process 2,..., process 2N 1 and process 2N, respectively. Initial state: The 0th process describes particles distributed according to the initial wavefunction,ψ(x 0,0), which is typically chosen as δ function (ψ(x 0,0) = δ(x x 0 )). Diffusive displacement: The DMC algorithm produces x 1 = x 0 + σρ 1, x 2 = x 1 + σρ 2, etc. by generating random numbers ρ n ;n = 1,2,... Birth-death processes: After each time step, each particle is replaced by a number of replicas which is given by: m n = min[int[w (x n )] + u,3] (335) where u is a random number which is uniformly distributed in [0,1]. If m n = 0 the particle is deleted and diffusion process is terminated (death). If m n = 1 there is no effect, the particle stays alive and the algorithm takes it to the next diffusion step. If m n = 2 the particle goes to the next diffusion step and another particle starts off a new series at the present location (birth). If m n = 3, the scenario is similar to the previous case but there are 2 newly born replicas starting off at the current location. Algorithm: Now, it is time to summarize the algorithmic steps of the DMC: 1) One starts with N 0 particles at positions x (i) 0,i = 1,2,..,N 0 which are placed according to the distribution ψ(x 0,0). It is more convenient to choose all replicas to start at the same point x 0. 2) Rather than following the fate of each replica, one follows all replicas simultaneously: x (j) 1 = x(j) 0 + τ/mρ (j) 1 ;j = 1,2,...,N 0 (336) This is regarded as one-step diffusion process of replicas. 3) Once the new position x (j) 1 is calculated, one evaluates W (x(j) 1 ) through 330 and from 335 one determines a set of integers m (j) 1 for j = 1,2,...,N 0. Replicas with m (j) 1 = 0 are terminated. If m (j) 1 = 1 replicas are left unaffected. Replicas with m(j) 1 = 2,3 go to the next diffusion step, but 1,2 more replica(s) should be added to the system at the current position. 4) The number of replicas is counted and N 1 is determined. 5) During the combined diffusion and birth-death processes, the distribution of replicas 145

146 changes in such a way that the coordinate x (j) 1 now is distributed according to the probability density ψ(x, τ). 6) As a result of birth-death processes, the total number of replicas,n 1, is now different from N 0. One wants to have almost constant number of replicas during the calculations. Therefore, one can use a suitable choice of E R to fix the increased or decreased number of replicas. Note that for sufficiently small τ 330 can be approximated as W (x) 1 (V (x) E R ) τ/. Now, averaging over all replicas: < W > 1 1 (< V > 1 E R ) τ/ (337) with < V > 1 = 1/N 1 N1 j=1 V (x(j) 1 ) One would like < W > 1 to be eventually always unity. Therefore, E (2) r can be evaluated as (the proof is left for homework): E (2) R E (1) R =< V > 1 (338) = E(1) R + / τ(1 N 1/N 0 ) (339) The diffusive displacement, the birth-death processes and estimation of new E R are repeated until E R and distribution of replicas converge to stationary values. Now, the distribution of replicas can be interpreted as the ground state wavefunction and the ground state energy can be calculated as E 0 = lim n < V > n. XIV. DENSITY-MATRIX APPROACHES The quantum state of a single particle thus far has been described by a wavefunction Ψ (x) in coordinate and spin space. In this section, we will consider an alternative representation of the quantum state, called the density matrix. The density matrix was originally introduced in quantum statistical mechanics to describe a system for which the state was incompletely specified. Although describing a quantum system with the density matrix is equivalent to using the wavefunction, it has been shown that density matrices are more practical for certain time-dependent problems. The general N-order density matrix is formally defined as γ N (x 1 x 2...x N,x 1x 2...x N ) Ψ N (x 1,x 2,...,x N )Ψ N (x 1,x 2,...,x N ) (340) where x i = {r i,s} denotes spatial and spin coordinates. Note that the density matrix contains two sets of independent quantities, {x i } and {x i}, that gives γ N a numerical value. Equivalently, Eq. 340 can be viewed as the coordinate representation of the density operator, ˆγ N = Ψ N Ψ N (341) 146

147 since x 1 x 2...x N ˆγ N x 1 x 2...x N = x 1 x 2...x N Ψ N Ψ N x 1 x 2...x N Note that ˆγ N can also be thought of as the projection operator onto the state Ψ N. We then have for normalized Ψ N, Tr( ˆγ N ) = ΨN (xn )Ψ N (x N )dx N = 1 where x N stands for the set {x i } N i=1. The trace of an operator Â is defined as the sum of diagonal elements of the matrix representing Â, or the integral if the representation is continuous as above. It can also be verified that Â = Tr( ˆγ N Â) = Tr(Â ˆγ N ) From this, the density operator ˆγ N can be seen to carry the same information as the N- electron wave function Ψ N. Note that while Ψ is defined only up to an arbitrary phase factor, ˆγ N for a state is unique. ˆγ N is also positive semidefinite and Hermitian. The state of the system is said to be pure if it can be described by a wavefunction, and mixed if it cannot. A system in a mixed state can be characterized by a probability distribution over all accessible pure states. We can think of γ N as an element of a matrix (density matrix); if we set x i = x i for all i, we get the diagonal elements of the density matrix, γ N (x 1 x 2...x N ) Ψ N (x 1,x 2,...,x N )Ψ N (x 1,x 2,...,x N ) = Ψ N (x 1,x 2,...,x N ) 2 which is the N-order density matrix for a pure state. Note that this is also the probability distribution associated with a solution of the Schrödinger equation. We can express the Schrödinger equation in density-matrix formalism by taking the time derivative of the density operator and using Hermiticity and commutation relations, t ˆγ N = t ˆγ N = ( ) ( ) t Ψ N Ψ N + Ψ N t Ψ N (Ĥ ) (Ĥ ) i Ψ N Ψ N Ψ N i t ˆγ N = [ Ĥ, ˆγ N ] i Ψ N (342) This equation describes how the density operator evolves in time. We can generalize the density operator ˆγ N to the ensemble density operator ˆΓ = p i Ψ i Ψ i (343) i where p i is the probability of the system being found in the state Ψ i, and the sum is over the complete set of all accessible pure states. p i has the following properties since it is a probability: p i 0, p i = i

148 We can then rewrite Eq. 342 in terms of the ensemble density matrix to obtain i t ˆΓ = [ Ĥ, ˆΓ ] (344) which is true if ˆΓ only involves states with the same number of particles, as is true in the canonical ensemble. This equation is also known as the von Neumann equation, the quantum mechanical analog of the Liouville equation. For stationary states, ˆΓ is independent of time, which means that [Ĥ, ˆΓ ] = 0 which implies that Ĥ and ˆΓ share the same eigenvectors. Work done in statistical mechanics deal heavily with systems at thermal equilibrium, where the density matrix is characterized by thermally distributed populations in the quantum states ˆρ = e βĥ Z where β = 1/k B T, k B is the Boltzmann constant, and Z is the partition function Z = Tr(e βĥ ) In this language, one can express a thermally averaged expectation value as ˆΩ = Tr( ˆΩ ˆρ) Z With a mixed state, we have less than perfect knowledge of what the quantum state is. We can describe how much less information there is by defining the entropy as S = k B Tr[ρ lnρ] The basic Hamiltonian operator, Eq. 19, is a sum of two symmetric one-electron operators and a symmetric two-electron operator, neither depending on spin. Along with the fact that the wavefunctions Ψ N are antisymmetric, the expectation values of the density operator can be systematically simplified by integrating the probability densities over N 2 of its variables, giving rise to concepts of reduced density matrix and spinless density matrix. A. Reduced density matrices The reduced density matrix of order p is defined as γ p (x 1 x 2...x p,x 1 x 2...x p ) = ( ) N p γ N (x 1 x 2...x px p+1...x N,x 1 x 2...x p...x N )dx p+1...dx N (345) 148

149 where ( N) p is a binomial coefficient, and γn is defined as Eq This is also known as taking the partial trace of the density matrix. For example, the first-order density matrix γ 1 is defined as γ 1 (x 1,x 1) = N... Ψ (x 1 x 2...x N )Ψ (x 1 x 2...x N )dx 2...x N (346) and normalizes to Tr γ 1 (x 1,x 1) = γ 1 (x 1,x 1 )dx 1 = N Similarly, the second-order density matrix γ 2 is defined as γ 2 (x 1 x 2,x N(N 1) 1x 2 ) = Ψ (x 1 2 x 2 x 3...x N )Ψ (x 1 x 2 x 3...x N )dx 3...dx N (347) and normalizes to the number of electron pairs Tr γ 2 (x 1 x 2,x 1x 2 ) = γ 2 (x 1 x 2,x 1 x 2 )dx 1 dx 2 = N(N 1) 2 The reduced density matrices γ 1 and γ 2 just defined are coordinate-space representations of operators ˆγ 1 and ˆγ 2, acting on the one- and two-particle Hilbert spaces, respectively. We can express the one-particle operator in terms of its eigenvalues and eigenvectors ˆγ 1 = n i ψ i ψ i i where the eigenvalues n i are the occupation numbers and the eigenvectors ψ i are the natural spin orbitals. Similarly, the two-particle operator can be expressed as ˆγ 2 = g i θ i θ i i where the eigenvalues g i are the occupation numbers and the eigenvectors θ i are called natural geminals. It also follows that n i 0 and g i 0. Comparing these two operators with Eq. 343, we can see that n i is proportional to the probability of the one-electron state ψ i being occupied and g i is proportional to the probability of the two-electron state θ i being occupied. Now let us consider the expectation values of one- and two-electron operators with an antisymmetric N-body wavefunction Ψ. For a one-electron operator Ô 1 = N O 1 (x i,x i ) i=1 we have Ô1 = Tr(Ô1γ N ) = O 1 (x 1,x 1 )γ 1(x 1,x 1)dx 1 dx 1 (348) 149

150 If the one-electron operator is local, i.e. O 1 (r,r) = O 1 (r)δ(r r), we can conventionally write down only the diagonal part; thus Ô1 = Tr(Ô1γ N ) = [O 1 (x 1 )γ 1 (x 1,x 1)] x 1 =x 1 dx 1 Similarly, if the two-electron operator is local, we have Ô 2 = N O 2 (x i,x j ) i<j and the corresponding expectation value Ô2 = Tr(Ô2γ N ) = [O 2 (x 1,x 2 )γ 2 (x 1,x 2,x 1,x 2 )] x 1 =x 1,x 2 =x dx 2 1dx 2 We thus obtain for the expectation value of the Hamiltonian, Eq. 19, in terms of density matrices E = Tr(Ĥ ˆγ N ) = E[γ 1,γ 2 ] [( = 1 ) ] v(r 1) γ 1 (x 1,x 1) dx 1 + x 1 =x 1 1 r 1 r 2 γ 2(x 1 x 2,x 1 x 2 )dx 1 dx 2 (349) We can further simplify this result by integrating over the spin variables. B. Spinless density matrices The first-order and second-order spinless density matrices are defined by ρ 1 (r 1,r 1) = γ 1 (r 1 s 1,r 1 s 1 )ds 1 = N Ψ (r 1 s 1x 2...x N )Ψ (r 1 s 1 x 2...x N )ds 1 dx 2...dx N (350) and ρ 2 (r 1 r 2,r 1r 2 ) = γ 2 (r 1 s 1r 2 s 2,r 1 s 1 r 2 s 2 )ds 1 ds 2 N(N 1) = Ψ (r 1 2 s 1r 2 s 2x 3...x N )Ψ (r 1 s 1 r 2 s 2 x 3...x N )ds 1 ds 2 dx 3...dx N (351) We can introduce a shorthand notation for the diagonal elements of ρ 1, ρ 1 (r 1 ) = ρ 1 (r 1,r 1 ) = N Ψ 2 ds 1 dx 2...x N 150

151 and similarly for ρ 2, ρ 2 (r 1,r 2 ) = ρ 2 (r 1 r 2,r 1,r 2 ) = N(N 1) 2 Ψ 2 ds 1 ds 2 dx 3...dx N Also note that from the above definitions, we can express the first-order density matrix in terms of the second-order density matrix ρ(r 1,r 1) = 2 ρ N 1 2 (r 1 r 2,r 1 r 2 )dr 2 ρ(r 1 ) = 2 ρ N 1 2 (r 1,r 2 )dr 2 The expectation value of the Hamiltonian, Eq. 349, in terms of density matrices now becomes E = E[ρ 1 (r 1,r 1),ρ 2 (r 1,r 2 )] [ = 1 ] 2 2 ρ 1 (r,r) r =r dr + v(r)ρ(r) + 1 r 1 r 2 ρ 2(r 1,r 2 )dr 1 dr 2 (352) where the three terms represent the electronic kinetic energy, the nuclear- electron potential energy, and the electron-electron potential energy, respectively. Note that since we can express the first-order density matrix in terms of the second-order, only the second-order density matrix is needed for the expectation value of the Hamiltonian. C. N-representability From Eq. 349, one may hope to minimize the energy with respect to the density matrices, thus avoiding having to work with the 4N- dimensional wavefunction. Since only the second-order density matrix is needed for the energy minimization, the trial γ 2 must correspond to some antisymmetric wavefunction Ψ ; i.e. for any guessed secondorder density matrix γ 2 there must be a Ψ from which it comes via its definition, Eq This is the N-representability problem for the second-order density matrix. For a trial wavefunction to be N-representable, it must correspond to some antisymmetric wavefunction from which it comes via Eq It s a difficult task to obtain the necessary and sufficient conditions for a reduced density matrix γ 2 to be derivable from an antisymmetric wavefunction Ψ. Instead, it may be easier to solve the ensemble N- representability problem for Γ 2, where Γ p is the p-th order mixed state (ensemble) density matrix defined as Γ p (x 1 x 2...x p,x 1 x 2...x p ) = ( ) N p Γ N (x 1 x 2...x px p+1...x N,x 1 x 2...x p...x N )dx p+1...dx N (353) 151

152 Since E 0 = Tr(ĤΓ 0 N ) Tr(Ĥ ˆΓ N ) it is completely legitimate to enlarge the class of trial density operators for an N-electron problem from a pure-state set to the set of positive unit-trace density operators made up from N-electron states. This minimization leads to the N-electron ground state energy and the ground state ˆγ N if it is not degenerate, or an arbitrary linear combination ˆγ N (convex sum) of all degenerate ground states if it is degenerate. Thus the minimization in Eq. 352 can be done over ensemble N-representable Γ 2. For a given ˆΓ 1, ˆΓ 1 = n i ψ i ψ i the necessary and sufficient conditions for it to be N-representable are that i 0 n i 1 (354) for all of the eigenvalues of ˆΓ 1. This conforms nicely with the Pauli exclusion principle. Let us now prove this theorem of the necessary and sufficient conditions for the firstorder density matrix to be N-representable. The necessary conditions for Γ 1 and Γ 2 are such that they satisfy Eq. 353 for a proper Γ N. The sufficient conditions are those that guarantee the existence of a Γ N that reduces to Γ 1 or Γ 2. The set of Γ 1 or Γ 2 that simultaneously satisfies both necessary and sufficient conditions is called the set of N -representable Γ 1 or Γ 2. If the energy is minimized over sets {Γ 1 } and {Γ 2 } satisfying only necessary conditions, an energy lower than the true energy can be obtained (lower bound). If it is minimized over sets satisfying only sufficient conditions, an energy higher than the true energy is obtained (upper bound). If one minimizes the energy over all sets satisfying the sufficient conditions, the ground-state energy is obtained. The necessary conditions on γ 1 and γ 2 imposed by N-representability are also called Pauli conditions and are as follows. If ψ i is some normalized spinorbital state and ψ i ψ j is a normalized 2 2 Slater determinant built from orthonormal ψ i and ψ j, then 0 ψ i ˆγ 1 ψ i 1 (355) 0 ψ i ψ j ˆγ 2 ψ i ψ j 1 (356) In the coordinate representation, they can be written as 0 dx 1 dx 1 ψ i (x 1 )γ 1(x 1,x 1)ψ i (x 1 ) 1 (357) dx 1 dx 1 dx 2dx 2 [ ψ i (x 1 )ψ j (x 1 ) γ 2(x 1 x 2,x 1x 2 ) ψ i (x 1)ψ j (x 2) ] 1 (358) Eq. 355 is equivalent to the requirement that the eigenvalues of ˆγ 1 are given by Eq. 354, whereas Eq. 356 is not equivalent to the eigenvalues of ˆγ 2 since the eigenfunctions in 152

153 general are not 2 2 Slater determinants. Let us prove this relation for ˆγ 1. We start by introducing the field creation and annihilation operators, which create and annihilate one-particle states that are the eigenfunction in coordinate space ˆψ (x) = ψi (x)â i ˆψ(x) = ψ i (x)â i i i where â i = dxψ i (x) ˆψ (x) â i = dxψ i (x) ˆψ(x) Let us consider an arbitrary single-particle operator Â(i) and do a spectral decomposition Â(i) = α α Â β β = A αβ α β The simplest N-particle operation can be constructed as αβ αβ Â N = Â(1) + Â(2) + + Â(N) where Â(i) acts on the states of the ith particle, Â(i) α 1 α i α N = β i Â α i α 1 β i α N = A βi α i α 1 β i α N β i β i giving Â N α 1 α i α N = N i=1 β i A βi α i α 1 β i α N We can instead write Â in terms of creation and annihilation operators Â = A αβ â αâβ = dx 1 dx 1 A(x 1,x 1 ) ˆψ (x 1 ) ˆψ(x 1 ) αβ The expectation energy of Â is then obtained from Â = Ψ Â Ψ = dx 1 dx 1 A(x 1,x 1 ) Ψ ˆψ (x 1 ) ˆψ(x 1 ) Ψ if we now compare this equation to Eq. 348, and let Â = Ô1, we find that the first-order density matrix can be written as γ 1 (x 1,x 1) = Ψ ˆψ (x 1 ) ˆψ(x 1 ) Ψ We can now rewrite Eq. 357 as 0 dx 1 dx 1 ψ i (x 1 ) Ψ ˆψ (x 1 ) ˆψ(x 1 ) Ψ ψ i(x 1 ) 1 153

154 0 which can be reduced down to dx 1 dx 1 ψ i (x 1 ) Ψ ψl (x 1)â l ψ j (x 1 )â j Ψ ψ i (x 1 ) 1 l 0 Ψ â i âi Ψ 1 where â i âi = ˆN i is just the number operator, which generates the occupation number for the ith orbital. Since ˆN i is a projection operator, the expectation value of it is always nonnegative and less than or equal to 1. Hence the Pauli condition for the first order density matrix, Eq. 355 is satisfied. The sufficient conditions require that the there must exist a Γ N that reduces to Γ 1 and Γ 2. The proof for Γ 1 is as follows. First we need a simple lemma about vectors and convex sets. A set is convex if an arbitrary positively weighted average of any two elements in the set also belongs to the set. Next we define an extreme element of a convex set as an element E such that E = p 1 Y 1 + p 2 Y 2 implies that Y 1 and Y 2 are both multiples of E. Then the lemma states that the set L of vectors v = (v 1,v 2,...) in a space of arbitrary but fixed dimension with 0 v i 1 and v i = N is convex and its extreme components are the vectors with N components equal to 1 and all other components equal to 0. Given this lemma, it is clear that any ˆγ 1 or ˆΓ 1 satisfying Eq. 354 is an element of a convex set whose extreme elements are those γ1 0 or Γ 1 0 that have N eigenvalues equal to 1 and the rest equal to 0. Each of these ˆγ 1 0 or ˆΓ 1 0 determines up to a phase a determinantal N-electron wavefunction and a unique corresponding pure-state density operator ˆγ N 0. Some positively weighted sum of these γn 0 will be the Γ N that reduces to the given ˆγ 1 or ˆΓ 1 through Eq Hence, sufficiency is proved. j D. Density matrix functional theory We have studied DFT in depth in an earlier section, where we started from a variational principle that had the electron density ρ(r) as the basic variable. Using the concept of density matrices, we can construct a corresponding density-matrix-functional theory (DMFT), in which the basic variable is the first-order reduced density matrix γ 1 (x,x), or the first-order spinless density matrix ρ 1 (r,r). The main advantage of DMFT over DFT is that the kinetic energy as a functional of the density matrix is known, therefore there is no need to introduce an auxiliary system. The unknown exchange- correlation part only has to describe only has to describe the electron- electron interactions, whereas in DFT, the kinetic energy part was also included. As we know, the first-order density matrix determines the density ρ(r) = ρ 1 (r,r) = γ 1 (x,x) ds x=x and by the Hohenberg-Kohn theorem, it determines all properties including the energy, E = E[γ 1 ] = E[ρ 1 ] 154

155 The explicit form of the functional is given via a constrained search, where the search can be over all trial density operators E[γ 1 ] = min γ N γ 1 Tr( ˆγ N Ĥ) The variational principle for DMFT can be written as δ{e[γ 1 ] µn[γ 1 ]} = 0 (359) where µ is the chemical potential. This variational principle stands for E 0 = min γ 1 E[γ 1 ] (360) If we parametrize γ 1 in terms of natural spinorbitals ψ i and occupation numbers n i, we obtain ( ) E µ = for all i n i ψ i,n i n j assuming that 0 < n i < 1 (361) If there are any natural orbitals among the complete set of natural orbitals for which Eq. 361 is not true, then the above assertion for µ is not true, for such an orbital with n i = 1 has E/ n i µ. It is a conjecture that Eq. 361 holds for all orbitals in an atom or molecule. If the variation in Eq. 359 is chosen for the natural orbitals themselves with orthonormalization conditions imposed, the result is a set of coupled differential equations for the natural orbitals, which are very different from the KS equations derived previously. From Eq. 349, the first-order density matrix determines all components of the energy explicitly except for V ee [γ], constrained search in Eq. 360 may be restricted to searching for the γ N that minimizes V ee, or since γ N γ 2 γ 1, we could write V ee [γ 1 ] = min γ N γ 1 Tr( ˆγ N ˆV ee ) 1 V ee [γ 1 ] = min γ γ 2 γ 1 r 2 (x 1 x 2,x 1 x 2 )dx 1 dx 2 12 where the trial γ 2 would have to satisfy the N-representability conditions outlined in the previous section. 155

156 E. Contracted Schrödinger equation The many-body Hamiltonian operator is the sum of 1 and 2 electron operators, which is why the energy of an N-electron system can be expressed as a functional of the 2- RDM, which depends on the variables of electrons. This gives us a method of studying the structure of electronic systems by determining the 2-RDM instead of the N-electron wavefunction. The main question is whether the Schrödinger equation can be mapped into the 2 electron space; and if it is capable, what would be the properties of the resulting equations? There were two approaches to answering this question: one method is to integrate the Schrödinger equation obtained in first quantization, resulting in what s called the density equation; the second method is to apply a contracting mapping to the matrix representation of the Schrödinger equation, obtaining what s known as the Contracted Schrödinger equation (CSE). It was found that although the two equations looked very different, they are in fact equivalent. In this section, we will put focus on the contracted Schrödinger equation. We begin with an introduction to the notation. The many-body Hamiltonian can be written as H = h ij a i a j + 1 ij kl a 2 i a j a la k ij ijkl where the 1-electron basis is assumed to be finite and formed by 2 K orthogonal spinorbitals. We can rewrite it as H = 1 K 2 ijkl a i a j a la k (362) ijkl where the elements of the two-particle reduced Hamiltonian matrix are given by K ijkl = 1 N 1 (h ikδ jl + h jl δ ik ) + ij kl is the reduced Hamiltonian, which has the same properties of the 2-electron matrix. In second quantization formalism, the p-order reduced density matrix (p-rdm) can be written as p D Ψ Ψ i 1,i 2...,i p,j 1,j 2,...,j p = 1 p! Ψ a i 1 a i 2...a i p a jp...a j2 a j1 Ψ (363) When Ψ Ψ, we have an expression defining an element of the p-order transition reduced density matrix (p-trdm). We will work with the case of pure states with Ψ = Ψ, p D Ψ i 1,i 2...,i p,j 1,j 2,...,j p = 1 p! Ψ a i 1 a i 2...a i p a jp...a j2 a j1 Ψ (364) where the complementary matrix to the p-rdm is the p-order holes reduced density matrix (phrdm), p D Ψ i 1,i 2...,i p,j 1,j 2,...,j p = 1 p! Ψ a j p...a j2 a j1 a i 1 a i 2...a i p Ψ (365) 156

157 where hole implies that Ψ itself is the reference state. This second quantization formalism for the density matrix is equivalent to the expressions in the previous sections. The equivalence is shown using the field creation and annihilation operators formulated in section C. Integrating the N-electron density matrix, N D 1,2...N;1,2...N = Ψ (1,2...N)Ψ (1,2...N ) over coordinates (p + 1) to N defines the p-rdm p D Ψ 1,2...p,1 2...p = Ψ (1,2...N)Ψ (1,2...p...N)d(p + 1)...dN where we have changed notation for convenience (i 1,i 2...i p,j 1,j 2...j p ) (1,2...p,1 2...p ). Let us begin with a quantum system of N fermions characterized by the Schrödinger equation (SE) H ψ n = E n ψ n where the wavefunction ψ n depends on the coordinates for the N particles. We will use second quantization to derive the contracted Schrödinger equation, emphasizing the use of of test functions for contracting the SE onto a lower particle space. Nakatsuji s theorem tells us that there is a one-to-one mapping between the N-representable RDM solutions of the CSE and the wavefunction solutions of the SE. The proof will be covered as homework. Because the N-particle Hamiltonian contains only two-electron excitations, the expectation value of H yields an energy E = 1 K 2 ijkl ψ a i a j a la k ψ = K 2 ijkl D ψ ij,kl ijkl ijkl where 2 D ψ ij,kl = 1 2 ψ a i a j a la k ψ Next define functions to test the two-electron space Φ ij kl = ψ a i a j a la k If we take the inner product of the test functions with the SE, we obtain ψ a i a j a la k Hψ = E ψ a i a j a la k ψ = 2E 2 D ij,kl If we substitute the Hamiltonian, Eq. 362 into the above, obtaining 1 2 K pqst ψ a i a j a la k a pa qa t a s ψ = 2E 2 D ij,kl pqst 157

158 Rearranging creation and annihilation operators to produce RDMs, we generate the 2,4-CSE ( K 2 D ) Ψ + 3 ( 3 Kpq,it D Ψ ij,kl pqj,ktl + K pq,jt 3 Dpqi,ltk) Ψ + 6 K 4 pq,st Dpqij,stkl Ψ = E 2 Dij,kl Ψ pqt pqst where the first term comes from considering two pairs of indices being the same (this removes two pairs of operators, leaving us with the 2-RDM). The second term comes from considering one pair of indices being the same (this removes one pair of operators, leaving us with the two 3-RDM terms). The factor of 3 in front comes from the different ways the operators can be rearranged. And the last term comes from considering no indices being the same (this allows us to commute the operators so all creation operators are on the left and all annihilation operators are on the right). The factor of 6 comes from the possible arrangements of the indices. The underlines in the first term indicate matrices. We can see that this depends on the 3-RDM and 4-RDM, causing it to be indeterminate. If we knew how to build higher ordered RDM from the 2-RDM, this equation would allow us to solve iteratively for the 2-RDM. This class of problems is called the reconstruction problem, and two approaches have been explored. One approach is to explicitly represent the 3-RDM and 4-RDM as functionals of the 2-RDM, and the other method is to construct a family of higher 4-RDMs from the 2-RDM by imposing ensemble representability conditions. Besides this difficulty, we also have to ask whether the solutions of this equation will coincide with those of the SE. The derivation for the 2-CSE shows that the SE implies the 2-CSE, but does the inverse hold true? The answer is yes; we must show that the p-cse for p 2 is equivalent to the SE by stating and proving the following theorem. Theorem [Nakatsuji]: If the RDM s are N-representable, then the p -CSE is satisfied by the p-, (p + 1)-, and (p + 2)-RDM if and only if the N-DM satisfies the Schrödinger equation. The proof of this theorem will be a homework problem. First, the SE is satisfied if and only if the following dispersion relation is satisfied. Ψ H 2 Ψ Ψ H Ψ 2 = 0 Therefore, we must prove that the 2-CSE in second quantized form must satisfy the dispersion relation. Since for p > 2, the p-cse implies the 2-CSE, the demonstration is also valid for the higher order equations. Note that this is not valid for the 1-CSE since the Hamiltonian includes 2-electron terms. One of the important consequences of the equivalence of 2-CSE and higher order CSE s with the SE is that the CSEs may be applied to the study of excited states. 158

159 XV. DENSITY MATRIX RENORMALIZATION GROUP (DMRG) The Density Matrix Renormalization Group (DMRG) is an approximation method which was first conceived by Steven White in 1992 as a way to handle strongly correlated quantum lattices. In the context of molecular physics, the renormalization group structure is often ignored, and DMRG is viewed as a special wave-function ansatz. We will start by writing the wave-function Ψ = Ψ HF + Ψ corr (366) Ψ HF Ψ = 1. (367) where Ψ HF is the Hartree-Fock wave-function, and Ψ corr is the correlation part. In most methods, it is assumed that Ψ corr is small compared to the exact wave function. Issues arise when the coefficients in the expansion of Ψ corr are on the order or larger than unity. The primary challenge of strongly correlated systems is that there are a large number of determinants that contribute significantly to the wave function. The goal of DMRG is to overcome this complexity by encoding the idea of locality into the wave-function which may include all possible determinants. In other words, most of the quantum phase space is not explored by physical ground states, which makes the strongly correlated problem far more tractable. Currently DMRG is able to describe about 40 electrons in 40 orbitals for compact molecules, and in elongated molecules it can describe about 100 electrons in 100 orbitals. A. Singular value decomposition When discussing DMRG singular value decomposition (SVD) proves to be an indispensable method which is able to break down the coefficient tensor of a strongly correlated wave function. SVD is a method which is able to decompose any M N matrix ψ (M N) where U is an M M column orthogonal matrix ψ = UDV T, (368) V is a square N N column orthogonal matrix U T U = I M M, (369) V T V = I N N, (370) and D is a diagonal matrix who s elements σ 1,,σ N are arranged in decreasing order. 159

160 The columns of U are chosen as the eigenvectors of ψψ T, the columns of V are chosen as the eigenvectors of ψ T ψ, and the singular values are the square root of the eigenvalues of ψ T ψ. The SVD is in general not unique and in many texts the diagonal matrix is defined to be square and either the left or right orthogonal matrix is defined to be rectangular, but all of these formulations are really the same thing. In order to prove the SVD we must first show that the eigenvalues of a real symmetric matrix are real, positive and that their eigenvectors are orthogonal. Given real valued ψ with dimensions M N we may construct ψ T ψ which is real symmetric. Given any eigenvalue λ and corresponding normalized eigenvector x we will show that λ must be real x,ψ T ψ x = λ x, x = λ (371) but x,ψ T ψ x = ψ T ψ x, x = λ x, x = λ (372) this means that λ = λ is real. Now we will show that these eigenvalues are non-negative. λ = x,ψ T ψ x = ψ x,ψ x = ψ x (373) Now let us assume that it has unique eigenvalues, and take any two of them (λ,µ) with their corresponding normalized eigenvectors ( x, y). From this we will show that the eigenvectors are orthogonal. λ y, x = y,ψ T ψ x = ψ T ψ y, x = µ y, x. (λ µ) y, x = 0 y, x = 0 since µ λ. Now if any of the eigenvalues are degenerate then those eigenvectors are clearly linearly independent (since they are different eigenvectors) and orthogonal to all other eigenvectors corresponding to different eigenvalues by the argument above, therefore we may decompose them using the Gram-Schmidt process and produce a set of orthogonal vectors for the degenerate eigenvalues. Now the proof of SVD is as follows. For all non-zero eigenvalues which are ordered from largest to smallest λ i (i {1, r} of ψ T ψ we define σ i = 2 λ i, and u i = ψ x i /σ i. Then 160

161 u i, u j = δ i,j. These u i may be therefore extended to a basis for R m. Now construct a matrix U out of these u i by defining each of these vectors in order as the columns of U, and then define matrix V in the same way buy using x i as its columns. From this definition we see that (U T ψv ) i,j = u i T ψ x j = x iψ T ψ x j 0 i,j > r = D σ i i,j. (374) σ i δ i,j else From this we have shown that ψ = UDV T. B. SVD applications Let us now apply SVD to a quantum system A, which is described by an M dimensional orthogonal basis { i A } M i=1, which is surrounded by an environment B, that is described by an N dimensional orthogonal basis { j B } N j=1. The state of this combined time-independent system can be represented as where ψ is real. Ψ = M N ψ i,j i A j B. (375) i=1 j=1 Now suppose that some operator O acts on the quantum system but not the environment. The expectation value of this operator O may be written as O = M i,i =1 j,j =1 N ψ i,j ψ i,j i O i δ j,j = M i,i =1 j=1 M ψ i,j ψ i,j i O i = M ρi A,i O i,i = Tr A[ρ A O], i,i =1 where ρ A = Tr B [ρ] = Tr B [ Ψ Ψ ] is the reduced density matrix for this system. (376) Now as an example lets assume that we have a specific case of a system described before where one spin in system A is in contact with some environment B which also contains a spin: then the resultant coefficient matrix is Ψ = 1 (4 3 ), (377) 5 ψ = ( ) 0 4, (378) 3 0

162 and it has the resultant singular value decomposition U = ( ) 1 0, D = ( ) 4 0, V = 0 3 ( ) 0 1, (379) 1 0 and from this we may show that (proved in the homework for general case) ρ A = u 1 u T u 2 u T 2. (380) Now suppose were were to approximate this RDM with only its largest eigenvalue (the most probable state) clearly this approximation would yield inexact expectation values, but in general as the coefficient matrices become sufficiently large and a smaller proportion of the eigenvectors are truncated, approximations of this type become better and better until solutions converge to the exact case. This is the real essence of DMRG. We use SVD in order to lower the dimensionality of the space over which we search for ground states by only searching over the most probable states. C. DMRG wave function To understand the DMRG wave function lets start with the FCI wave function expanded in a complete basis of determinants. Ψ = ψ s 1s 2 s L s 1 s 2 s L, (381) {s j } s j { vac,,, } (382) Here s 1 s 2 s L describes the occupancy of L orbitals, and the coefficient tensor ψ in the expansion above has dimension 4 L. This problem becomes intractable as L gets large, since unlike in normal CI we are unable to truncate this tenor since sparsity for strongly correlated systems is not assumed. The FCI tensor may be exactly decomposed by singular value decomposition as follows: ψ s 1s 2 s L = U[1] s1,α 1 s[1] α1 V [1] α1,s 2 s L. (383) α 1 And we may again do this to decompose V [1] A[1] s 1 α1 U[1] s1,α 1 s[1] α1. (384) V [1] α1,s 2 s L = U[2] s2,α 2 s[2] α2 V [2] α2,s 3 s L (385) α 2 162

163 A[2] s 2 α1,α 2 U[2] s2,α 2 s[2] α2 (386) We may continue doing this until the coefficient tensor is exactly decomposed ψ s 1s 2 s L = A[1] s 1 α1 A[2] s 2 α1,α 2 A[L] s L αl 1 = Tr[A[1] s 1 A[2] s2 A[L] s L ]. (387) {α k } This form is useful since instead of variationally optimizing the FCI tensor we may optimize over the tensors of its decomposition and truncate the virtual (α) dimension. As the dimension D of the virtual index is increased the MPS ansats includes a larger region of the full Hilbert space until it exactly captures the original FCI wave-function. D. Expectation values and diagrammatic notation At this point it is convenient to introduce diagrammatic notation for tensors. In tensor networks objects are denoted by shapes with lines connecting them. The amount of lines determines what type of object is being dealt with. Scalars have no lines, vectors are connected by one line, matrices have two, and higher dimensional tensors have more. FIG. 9. Diagrammatic notation for tensors. From this notation we may represent the FCI tensor: From this notation we may compute overlap integrals by contracting two tensors with like indicates. Lastly we may use this notation to represent operators and their expectation values. Given a many body operator O. 163

FIG. 10. Diagrammatic notation for FCI tensor and MPS tensor. FIG.

we may write it as: O = {r j,s j } O r 1r 2 r L s 1 s 2 s L r 1 r

Diagrammatic notation for an arbitrary operator.

164 FIG. 10. Diagrammatic notation for FCI tensor and MPS tensor. FIG. 11. Diagrammatic notation for FCI contractions and MPS overlaps. we may write it as: O = {r j,s j } O r 1r 2 r L s 1 s 2 s L r 1 r 2 r L s 1 s 2 s L, (388) FIG. 12. Diagrammatic notation for an arbitrary operator. Expectation values are calculated by contracting the open indicates on each side by appropriate wave-functions. 164

165 FIG. 13. Diagrammatic notation for an expectation value. E. Matrix product ansatz In DMRG it is assumed that the wave-function can be written as a matrix product state as described before (MPS ansatz), where the virtual dimensions are truncated to D. This wave-function is invariant to a number of transformations. Between any two A[i] we may place the identity without changing the wave-function. From this the DMRG wave function may be written in canonical form as Ψ = Tr[L[1] s1 L[p 1] s p 1 C[p] s p R[p + 1] s p+1 R[L] s L ] s 1 s 2 s L (389) {s j } where L s and R s satisfy the orthogonality conditions s j,α k 1 (L s j [k]) α k,α k 1 L s j [k] αk 1,β k = δ αk,β k (390) s j,α k 1 R s j [k] αk 1,α k (R s j [k]) α k,β k 1 = δ αk 1,β k 1 (391) From these L s i and R s i operators we may define sets of renormalized many-particle basis states {l},{r}, where α L p 1 = {s j }{α 1 α p 2 } L[1] s 1 α1 L[p 1] s p 1 α p 2,α p 1 s 1 s 2 s p 1, (392) 165

Introduction to Electronic Structure Theory

Introduction to Electronic Structure Theory C. David Sherrill School of Chemistry and Biochemistry Georgia Institute of Technology June 2002 Last Revised: June 2003 1 Introduction The purpose of these