Full Configuration Interaction Monte Carlo Studies of Quantum Dots

Size: px

Start display at page:

Download "Full Configuration Interaction Monte Carlo Studies of Quantum Dots"

Jocelyn Jordan
5 years ago
Views:

SCIENCE (Master n Computatonal Physcs) Faculty of Mathematcs

1 Full Confguraton Interacton Monte Carlo Studes of Quantum Dots by Karl Roald Lekanger THESIS for the degree of MASTER OF SCIENCE (Master n Computatonal Physcs) Faculty of Mathematcs and Natural Scences Department of Physcs Unversty of Oslo June 2013

2 2

3 Ths thess s dedcated to all near and dear for puttng up wth me durng the last year. And especally to Lsen, Oskar and Kaspar. I would also lke to thank Morten Hjorth-Jensen for the supervson and the motvaton, Smen Kvaal for helpng me out wth hs artcle and to Sarah, Sgve, Jørgen, Veronka, Gustav and everybody n the corrdor for a good tme. 3

4 Contents 1 Introducton Motvaton Achevements The structure of the thess I General theory 10 2 A bref ntroducton to Quantum Mechancs The postulates of quantum mechancs Antsymmetrc state vectors and fermons Spn and the spn statstcs theorem The many partcle bass The many partcle operators The varatonal prncple A bref ntroducton to the second quantzaton formalsm Introducton Operators n the second quantzaton notaton The tme ndependent Wck s theorem Partcle- hole formalsm Mathematcal Modellng of two dmensonal Quantum dots Introducton The Hamltonan The sngle partcle wave functons The many body wave functons The Normal ordered Hamltonan The Hamltonan matrx elements II Numercal many body methods 29 5 The Full Confguraton Interacton method Introducton to the method Error analyss of the FCI energes Extrapolaton Formulas

5 6 Projector Monte Carlo Methods Introducton Dffuson Monte Carlo The projecton operator and the short tme approxmaton The Dffuson Monte Carlo Algorthm Systematc errors The Hartree-Fock method Overvew The Roothaan-Hartree-Fock equatons III Full Confguraton Interacton Quantum Monte Carlo 42 8 The FCIQMC algorthm The mathematcal approach Populaton control and the statstcal estmators The FCIQMC algorthm and smulaton procedures Convergence crtera and the tme step error The FCIQMC sgn problem Intator-FCIQMC The ntator spaces Samplng rules The suggeston probablty dstrbuton Samplng of the suggeston probabltes Fndng the sets S n Storng and accessng the Coulomb matrx elements Indexng scheme Mappng of the ndces to a ponter Graphcal representaton of model spaces The arrow weght matrces Memory requrements and numercal speed Generalsaton to other systems IV Data analyss Statstcal analyss The statstcal error Flyvbjerg-Petersen analyss Flyvbjerg-Pedersen analyss for FCIQMC

6 13 Curve fttng The least square method Error analyses of the least square ft Error analyss of E(R) V Implementaton of the FCIQMC algorthm Numercal representaton of the determnants, the bass and the state vector The determnants The sngle partcle bass The state vector A sngle FCIQMC teraton Parallelzaton Introducton Numercal lbrares The dstrbuton of walkers on the MPI tasks Load balancng The program flow Organzaton of the code and the classes Overvew The ntsmulaton class The runsmulaton class The walkercontanerclass class The sortwalkers class The walkerdstrbuton class The loadbalancethreads class The walkerpropagator class The hamltonanelements class The lbgrie class The nputvars class Benchmarkng the Code Hotspots by CPU usage Scalng wth openmp and MPI Scalng wth the number of walkers The determnant load parameter and the redstrbuton threshold Modfyng the code to smulate other systems 106 6

7 VI Results Valdatng the code Introducton FCIQMC A smple system, two electrons n two shells Comparng FCIQMC energes and FCI energes Testng FCIQMC energes aganst analytcal results FCIQMC The statstcal estmators The scalng of the algorthm The crtcal number of walkers Scalng of the FCIQMC algorthm Concludng remarks Convergence of the -FCIQMC energes Smulatons wth a Hartree Fock bass Extrapolated energes Introducton Testng the extrapolaton formula Open shell results for ω = 1/ Open shell results for ω = Summary and Comments Conclusons and perspectves Summaton and achevements Ideas and prospects Concludng remarks VII Appendx 137 A Runnng the code 138 7

8 Chapter 1 Introducton 1.1 Motvaton Full Confguraton Interacton Monte Carlo (FCIQMC) s an ab nto method to calculate ground state propertes of quantum many body systems. The algorthm was frst made famous by Booth et. al. (2009) [3], and has snce then seen great success wthn the quantum chemstry communty. Our motvaton was to mplement FCIQMC and apply the algorthm on a new physcal system, namely two dmensonal quantum dots n parabolc potentals. Ths s a well explored physcal system, whch have been smulated usng a range of numercal methods lke Varatonal Monte Carlo [11], Dffuson Monte Carlo [22], Coupled Cluster [24] and Full Confguraton Interacton [20, 15, 16], and therefore provdes a good benchmark for the FCIQMC algorthm. Furthermore, the degree of correlaton can be tuned by ncreasng or decreasng the strength of the confnng potental, and thus allows us to study how the algorthm performs wth systems of varyng correlaton strength. 1.2 Achevements Our man achevement s to successfully have mplemented FCIQMC, whch s a rather novel algorthm wth few mplementatons. The mplementaton was a challengng task, and qute a lot of thnkng and expermentng was necessary to fnd vable solutons. The man obstacles was to fnd an effcent numercal representaton of the state vectors wth a low memory footprnt, and to develop a fast parallel algorthm wth a small overhead usng a hybrd approach wth multthreadng and MPI. Some work has also been nvested n fndng a practcal and relable way of analyzng the stochastc error of the so called projected estmator of the energy and developng a fast algorthm for storng and retrevng the Hamltonan matrx elements. We have studed how the algorthm performs when appled on quantum dots wth a dfferent degree of correlaton and a dfferent number of partcles. We beleve that these results can be generalzed to other systems as well, and ths may be seen as our man results. We have also made a few predctons of the energy of quantum dots wth N P 6 partcles usng extrapolaton of the energes, and have demonstrated that open shell Dffuson Monte Carlo calculatons of the same systems produce ground state energes that are slghtly too hgh. 8

9 1.3 The structure of the thess Ths thess s organzed n sx parts I: In the frst part we dscuss the theoretcal background for our project. An ntroducton to quantum mechancs s followed by a short revew of the physcs of quantum dots. II: In the second part we dscuss dfferent numercal many body methods such as Full Confguraton Interacton, Dffuson Monte Carlo and Hartree-Fock. It s mportant to have a certan knowledge of these methods, both as a theoretcal bass to understand the FCIQMC algorthm and as a theoretcal background when we nterpret our results. III: The thrd part s devoted to the theory and the numercal methods that s drectly relevant for the FCIQMC method. Three chapters are devoted to the FCIQMC algorthm, the ntator adapton -FCIQMC and samplng methods. In the last chapter we have ntroduce an ndexaton and storage scheme for the Hamltonan Matrx elements. IV: In the fourth part we are dealng wth methods to analyse output data from the smulatons. We have dscussed how the statstcal error can be calculated and how we can ft parametrzed curves to extrapolate the results. V: In the ffth part we dscuss dfferent aspects of our mplementaton of the algorthm. The numercal representaton of the state vector and the parallelzaton of the algorthm are dscussed n the frst chapters. Next we look at the class structure and mplementaton detals. One chapter s devoted to benchmarkng and testng of the code, and n the last chapter we dscuss how t can be modfed to smulate other physcal systems. VI: In the last part we study how the algorthm performs when appled to two dmensonal quantum dots. Ths s an excellent test case snce the degree of correlaton n the systems can be vared by changng the potental strength.we have demonstrated that the code s reproducng Full Confguraton Interacton, Coupled Cluster and Dffuson Monte Carlo results, and we have calculated the ground state energes for a few open shell systems wth N P 6 partcles by extrapolatng the energes. We have also explored the scalng and effcency of the algorthm for systems wth a dfferent number of partcles and dfferent nteracton strengths. 9

10 Part I General theory 10

11 In ths part we gve a short ntroducton to non- relatvstc quantum mechancs and quantum many body theory. We wll also ntroduce the so called second quantzed notaton and dscuss how quantum dots are modelled mathematcally. The followng chapters are not n any way meant to be a thorough ntroducton to the subjects that we dscuss. Our goal here s twofold. Frst we want to ntroduce the necessary concepts, theorems and equatons that we need later n the thess, and second, we want to establsh a certan mathematcal notaton. 11

12 Chapter 2 A bref ntroducton to Quantum Mechancs Ths chapter s not meant to be a self-contaned ntroducton to quantum mechancs, but rather a bref ntroducton to the concepts that have a drect relevance to ths thess. Those who are lookng for a complete ntroducton to quantum mechancs are referred to an ntroductory text on the subject. See for example Ref. [31] for a thorough ntroducton to the feld. We wll frst state the fundamentals of quantum theory followed by the ntroducton of a few mportant propertes of quantum mechancal systems. 2.1 The postulates of quantum mechancs Quantum mechancs s an axomatc theory, meanng that t s based on a set of postulates from whch the theory s logcally derved. In ths chapter we are presentng the fundamental axoms or postulates of quantum mechancs and explan ther meanng. Ths wll serve as an extremely bref ntroducton to the man concepts of the theory. The frst postulate concerns the mathematcal representaton of a quantum mechancal system. Postulate 1: To every quantum system there s an assocated separable nfnte dmensonal complex Hlbert space H. A quantum state s represented as a normalzed state vector Ψ H. The Hlbert space, also called the state space, s a lnear vector space wth an nner product, and can be vewed as a generalzaton of Eucldan spaces. Hlbert spaces are characterzed by a set of propertes that we wll now present. Assume that the vectors { a, b } are n H. Then the nner product of two elements a, b s denoted a b, and obeys the followng relatons (): The Hermtan symmetrc property: a b = b a, (): Lnearty n the frst element : ca b = c a b, (): Conjugate lnearty n the second element : a cb = c a b, (v): Addtvty n the frst element: a 1 + a 2 b = a 1 b + a 2 b, 12

13 (v): Addtvty n the second element: a b 1 + b 2 = a b 1 + a b 2. (v): Postvty: a a 0. Ths s a general defnton of Hlbert spaces, but as we wll see later, the exact propertes of the state spaces are defned by the physcal system tself. Note that any lnear combnaton of vectors n H (except the null vector) represents a physcal state. If we for example have a bass { } n =0 for the state space, any state can be represented as Ψ = n =1 c, (2.1) where {c } are complex weghts, or ampltudes, wth the property n =1 c 2 = 1 to normalze the state. The next postulate concerns the jont Hlbert spaces of composte systems. An example of such systems s the many electron quantum dots that we are dealng wth n ths thess. Postulate 2: If H 1 and H 2 are the Hlbert spaces of two quantum systems, then the Hlbert space of the composte system s the tensor product H 1 H 2. For example, for a system consstng of a number of dstngushable partcles n the states a, b and c, the composte state s the tensor product Ψ = a b c. We wll come back to ths subject later when we dscuss many partcle systems, and as we wll see, ths theorem provdes us wth an easy way of constructng the Hlbert spaces of many partcle quantum dots. The state vector encapsulates all there s to know about the physcal system, but quantum mechancal systems are not well defned n a classcal sense where all measurable quanttes can be known smultaneously. They are abstract quanttes that contan nformaton about the probablty of the dfferent outcomes of a measurement. The lnk between the abstract state vectors and measurable quanttes are defned by Hermtan 1 operators as stated n the thrd postulate. Postulate 3: To every observable of a quantum mechancal system, there s assocated a lnear Hermtan operator. The spectrum of egenvalues of the operator represents the measurable values of the observable. An observable s defned as a measurable quantty such as the angular momentum or the energy of the state. Hermtan operators have the propertes that all egenvalues are real and that the egenvectors span the space n whch the operators act. An example s the Hamltonan operator, Ĥ whch represents the energy of the state. Assume that the operator Ĥ has the egenvalues {ε } n =1 and the egenvectors { ϕ } n =1 whch are a bass for H. The egenvalues represents the possble outcome of a measurement of the energy and the correspondng egenvectors are the states n whch the observable s sharp or well defned. The probablty of measurng the energy ε s gven by the functon P (ε ) = Ψ ϕ ϕ Ψ, (2.2) 1 An operator Ô s Hermtan f Ô = Ô, where Ô s the adjont operator of Ô. 13

14 whch means that the energy s well defned only n the case that the state vector s parallel to an egenvector of Ĥ, or n the subspace spanned by the egenvectors { ϕ 1, ϕ 2,..., } wth degenerated energes ε 1 = ε 2 =.... As a consequence of ths, t s mpossble to smultaneously measure the observables of two operators Ĥ and Ô wth non-parallel egenvectors, or more precsely, whch are not commutng [Ô, Ĥ] 0. Ths property s commonly known as the uncertanty prncple. A measurement n quantum mechancs s a subtle concept whch often gves rse to phlosophcal dffcultes. The fourth postulate bypasses these problems by provdng an operatonal defnton of an deal measurement. Postulate 4: A deal measurement of an observable o wth the correspondng operator Ô leaves the quantum system n the state o where Ô o = o o. As an example, f the quantum state Ψ s measured to have the energy ε, we say that the state has collapsed nto the correspondng egen state ϕ. Ths s modeled as an nstantaneous process where the state changes from Ψ to ϕ. Note that both the concept of an deal measurement and the dea of an nstantaneous change of the physcal state s controversal, and that ths s an actve feld of research (see for example Ref. [32]). The ffth and last postulate states that the dynamcs of a quantum system s descrbed by a famous partal dfferental equaton. Postulate 5: The tme evoluton of the state vector s descrbed by the tme dependent Schrödnger equaton. The Schrödnger equaton can be wrtten h Ψ(t) = Ĥ Ψ(t), (2.3) t where h s the reduced Planck s constant and Ĥ s the Hamltonan operator. The quantum mechancal wave functon Ψ(t) s a probablty ampltude that descrbes the state of a quantum mechancal system at tme t. If Ψ(t) s an egenstate of the Hamltonan Ĥ, then Ψ(t) s sad to be statonary and s represented by the wave functon Ψ(t) ϕ e ε t/ h, (2.4) where { ϕ } are the egenstates of the Hamltonan and {ε } are the egenvalues Ĥ ϕ = ε ϕ. (2.5) Obvously, ths can only be true f the Hamltonan has no explct tme dependence. Snce the Hamltonan operator s a hermtan operator, the set of egenstates ϕ s a bass. Ths means that as long as the Hamltonan operator s tme ndependent, all quantum states can be wrtten as a lnear combnaton of the statonary states Ψ(t) = φ e ε t/ h, (2.6) whch always solves the Schrödnger equaton. In ths thess we are only dealng wth tme ndependent quantum mechancal systems. Therefore we are only solvng the egen value problem Eq. (2.5) whch we wll refer to as the statonary Schrödnger equaton from now on. Snce the statonary states are egenstates of the Hamltonan, the egenvalues ε must be nterpreted as the possble energes of the system. The lowest egenvalue ε wll be referred to as the ground state energy of the system and the correspondng egenstate wll be referred to as the ground state. 14

15 2.2 Antsymmetrc state vectors and fermons A state vector whch descrbes a system of dentcal partcles can be shown to be ether symmetrc or antsymmetrc wth regard to the nterchange of two partcles. And as we wll show n ths secton, ths seemngly nnocent property has mportant consequences for the behavour of quantum partcles. We wll start wth a short dervaton. Consder an N partcle quantum system of dentcal partcles Ψ = 1, 2, 3,..., N. (2.7) Ths state must have the same physcal propertes when two partcles are permuted, whch means that for any observable Ô, Ψ ˆP jô ˆP j Ψ = Ψ Ô Ψ, (2.8) where ˆP j s the nterchange operator whch has the property that t nterchanges partcle and j. Ths equaton has the soluton ˆP j Ψ = ± Ψ. (2.9) All elementary partcles are separated n two categores of partcles called fermons, as for example electrons, and bosons, wth the property that ˆP j Ψ = Ψ for fermons and ˆP j Ψ = + Ψ for bosons. (2.10) A state vector descrbng a system of fermons s therefore sad to be antsymmetrc wth regard to the exchange of two partcles. A property of such states s that two dentcal fermons never can be n the same quantum state. Ths s easy to see f we consder the two partcle state 1, 1 consstng of two dentcal fermons. If we apply the nterchange operator on ths state we see that whch shows that ths state can not exst. 1, 1 = ˆP 12 1, 1 = 1, 1 1, 1 = 0, (2.11) 2.3 Spn and the spn statstcs theorem Spn s an ntrnsc property of elementary partcles whch have no classc counterpart but s a pure quantum phenomenon. The spn quantum number s can only have values s hn/2 where n s a postve nteger and h s the reduced Planck constant. Any gven class of elementary partcles has a fxed spn value that can not be changed, and accordng to the spn statstcs theorem, all partcles wth half nteger spns are fermons [31]. The spn of a sngle electron s descrbed by a spnor χ C 2 whch s usually represented as a lnear combnaton of the egen functons of the spn projecton operator Ŝz and a many partcle wave functon has a spnor Ŝ z = s, Ŝ z = s, (2.12) χ {, } {, } {, }... {, }. (2.13) 15

16 There s a magnetc moment assocated to the spn of an electron. The magnetc moment contrbute to the total energy of an electronc system, but ths effect s usually very small and s often neglected. In ths thess, we use a Hamltonan wthout a spn couplng, meanng that we gnore the magnetc moment of the electrons. However, the spn of the partcles stll have a very mportant mpact on the behavour of a quantum system. Because of the Paul excluson prncple, the many partcle bass s lmted to states where any one partcle state s occuped by at most one electron. 2.4 The many partcle bass The bosonc and fermonc many partcle wave functons lve n dfferent Hlbert spaces and follow dfferent mathematcal rules. We wll only focus on fermonc systems snce these are the systems that we have dealt wth n ths thess. We assume that an orthonormal sngle partcle bass s avalable { ϕ 1, ϕ 2, ϕ 3,... } H 1, (2.14) where {ϕ } are egenfunctons of the sngle partcle hamltonan ĤHO, and the spnors are assumed to be contaned n the sngle partcle wave functons. Accordng to the second postulate, the many partcle Hlbert space s H = H 1 H 1 H 1... H 1. (2.15) In the case that the sngle partcle states are representng fermons, the many partcle states must be antsymmetrc wth regard to a permutaton of two partcles. The ant symmetrc Hlbert space of an N partcle system s a subspace of H and can be wrtten H AS = { Ψ H ; ˆP j Ψ = Ψ j 1, 2, 3,..., N}. (2.16) To construct the antsymmetrc states we wll defne the antsymmetrzaton operator Â. And to do so, we must frst defne the permutaton operator ˆP 1, 2,..., N whch changes the sequence of partcles (1, 2, 3,..., N) to ( 1, 2,..., N ). Any permutaton of the partcles can now be wrtten as a product of p nterchange operators where the sgn of the permutaton s The permutaton operator s defned as Â = ˆP 1, 2,..., N = ˆP m1,n 1 ˆPm2,n 2... ˆP np,m p, (2.17) S 1, 2,..., N = ( 1) p. (2.18) 1 ˆP1, 2,..., N S 1, 2,..., N, (2.19) All possble permutatons ˆP... N! where 1/ (N!) s a normalzaton factor (note that the total number of possble permutatons s N!). Snce ˆP j Â = Â, we see that the states ϕ 1 ϕ 2 ϕ 3... ϕ N Â ϕ 1 ϕ 2 ϕ 3... ϕ N, (2.20) 16

17 are antsymmetrc. The adjont of P j s the permutaton operator defned by the nverse permutaton Pj 1 = P j meanng that P T = P 1. Consequently, each permutaton operator s a untary operator and therefore Â must be as well. Snce Â s a untary operator the normalzaton and orthogonalty of the product states are conserved. The antsymmetrzed product states, wth 1 < 2 < 3 <... N, consttute a bass for H AS, H AS = span{ ϕ 1 ϕ 2 ϕ 3... ϕ N ; 1 < 2 < 3 <... N }, (2.21) where we have set 1 < 2 < 3 <... N to assure that all partcles are n a unque state and that every state s represented only once. The states ϕ 1 ϕ 2 ϕ 3... ϕ N are often wrtten as determnants ϕ 1 ϕ 2 ϕ 3... ϕ N = 1 1 det(ϕ 1, ϕ 2, ϕ 3,..., ϕ N ) N! N! ϕ (1) 1 ϕ (1) 2... ϕ (1) N ϕ (2) 1 ϕ (2) 2... ϕ (2) N ϕ (N) 1 ϕ (N) 2... ϕ (N) N, (2.22) where the superscrpt () denotes the th state n a drect product. Determnants are very useful snce they have the desred antsymmetrc propertes, and the many partcle state vectors are often slopply referred to as determnants. 2.5 The many partcle operators As we have already dscussed, observables are represented by Hermtan operators whch map H AS to tself. Many partcle operators are often categorzed after the number of partcles that are nvolved n an nteracton. The one body operators nvolve sngle partcle nteractons, the two body operators nvolve two partcle nteractons and so on. Our Hamltonan, as we wll see later, only contans one and two body operators, and we wll therefore take a closer look at these cases. The one body operators takes the form Ĉ = ĉ ĉ ĉ, (2.23) where ĉ s an operator n the sngle partcle Hlbert space H 1. On the sngle partcle bass { ϕ } the operator s wrtten ĉ = ϕ ϕ j ϕ ĉ ϕ j, (2.24) j and the ampltude for an nteracton between the two many body states ϕ n1... ϕ nn takes the smple form ϕ n1... ϕ nn Ĉ ϕ m 1... ϕ mn = N ϕ n ĉ ϕ mj. (2.25),j=1 17

18 A general two partcle operator ˆD can be wrtten as a sum of operators ˆD = 1 2 j ˆD j (2.26) where D j s defned on the two partcle space H 1 H 1 of partcle and j and can be wrtten on the form. ˆD j = d klmn ϕ k ϕ m ϕ l ϕ n , (2.27) klmn th place j th place and the nteracton ampltude s ϕ n1... ϕ nn ˆD ϕ m1... ϕ mn = N d n n j m k m l. (2.28),j,k,l=1 The operators representng an observable must obvously map H AS to tself. Ths mples, as s straght forward to show, that the operators must commute wth the antsymmetrzaton operator Â. Ths s always the case for the one body operator Ĉ, but for the two body operator ˆD t sets some restrctons on the form of d klmn. In our case, the two body operator s the Coulomb nteracton for whch d klmn = d mnkl, whch means that ˆD and Â commute. In fact, our Hamltonan s completely symmetrc wth regard to the exchange of two partcles snce all electrons have the same mass and charge. Ths means that the nterchange operators have no effect on the operator, and consequently t commutes wth the antsymmetrzaton operator. 2.6 The varatonal prncple The varatonal prncple s a general property of quantum systems, and s also the foundaton of several numercal many body methods. It states that any state Ψ H has an expectaton value for the energy Ψ Ĥ Ψ whch s larger or equal to the ground state energy. Furthermore, f the expectaton value s equal to the ground state energy, Ψ s a ground state of the Hlbert space. We wll show why ths s true for nondegenerate ground states. Assume that Ψ s expanded n the bass of egenfunctons { ϕ } of the Hamltonan operator The expectaton value of the Hamltonan s Ψ = c ϕ. (2.29) Ψ Ĥ Ψ = c 2 ε, (2.30) where {ε } are the egenvalues of Ĥ and {c } are complex weghts. Snce the ground state energy ε 0 s smaller or equal to all other ε, the normalzaton requrement c 2 = 1 mples that c 2 ε ε 0, (2.31) and only equal n the case that c 0 = 1 and c = 0 for > 0, n whch case Ψ s the ground state ϕ 0. The proof s easly generalzed to systems wth degenerate ground states, but n ths case the ground state s no longer unque. 18

19 Chapter 3 A bref ntroducton to the second quantzaton formalsm We wll now ntroduce a formalsm whch s very useful when dealng wth systems of many nteractng partcles. The formalsm wll be presented very brefly and wth no proofs. There are numerous texts whch gve a good ntroducton to the second quantzaton formalsm together wth relevant theorems and proofs. See for example Shavtt et. al. (2009) [33] or Gross et. al. (1991) [9]. 3.1 Introducton The second quantzaton formalsm makes use of creaton and annhlaton operators to add or remove partcles to a quantum state. The annhlaton operator a removes a state ϕ from a state, and the creaton operator a adds a state ϕ, a 2 ϕ 1 ϕ 2 ϕ 3... ϕ N = ϕ 1 ϕ 3... ϕ N, (3.1) a 2 ϕ 1ϕ 3... ϕ N = ϕ 1 ϕ 2 ϕ 3... ϕ N. (3.2) By defnng a set of antcommutaton relatons for these operators, we can ensure that the state vectors have the correct antsymmetrc propertes. The followng equatons can be shown to apply to fermonc systems It follows that {a, a j } = {a, a j } = 0, (3.3) {a, a j} = δ j. (3.4) a ϕ n 1 ϕ n2 ϕ n3... ϕ nn = 0 f {n 1, n 2,..., n N }, (3.5) a ϕ n1 ϕ n2 ϕ n3... ϕ nn = 0 f {n 1, n 2,..., n N }. (3.6) As follows from Eqs. (3.3), (3.4), these relatons assure that the state vectors have the correct symmetry propertes snce any permutaton of two dfferent operators (partcles) wll lead to a sgn change. The antsymmetrc propertes are now contaned n the algebrac expressons for the state vectors. Also note that Eq. (3.3) s an expresson of the Paul prncple. 19

20 Although ths s not strctly correct, these state vectors are often smply referred to as determnants. The reason s that state vectors that are represented as determnants have many of the same mathematcal propertes as the second quantzaton state vectors. We wll, n the rest of ths thess, comply to the sloppy language and refer to a general state vector as a determnant. Any such determnant can be wrtten a n 1 a n 2 a n 3... a n N = ϕ n1 ϕ n2 ϕ n3... ϕ nn, (3.7) where the ndces n must be ordered n some way to make sure that each determnant s only represented once. We use the orderng n 1 < n 2 < n 3 < < n N. The state represents the vacuum state whch symbolzes that there are zero partcles present n the system. Note that the vacuum state s normalzed = 1 and that an annhlaton operator workng on the vacuum state, a, s defned to be zero. An mportant property that follows from the antcommutaton relatons s that the orthogonalty of the determnants are preserved. Ths can be shown by evaluatng the nner product of a m1... a mn and a n1... a nn whch yelds as follows from Eq. (3.4). a mn... a m1 a n1... a nm = δ m1,n1... δ mn,nn, (3.8) 3.2 Operators n the second quantzaton notaton Operators can also be represented n terms of creaton and annhlaton operators. One body operators can be wrtten on the form whle two body operators can be wrtten on the form Ĉ = ϕ Ĉ ϕ j a a j, (3.9) j ˆD = 1 2 jkl ϕ ϕ j ˆD ϕ k ϕ l a a j a la k. (3.10) The last two equatons can be verfed by consderng the earler secton about many partcle operators. It s often convenent to rewrte the two body operators as ˆD = 1 4 jkl ϕ ϕ j ˆD ϕ k ϕ l AS a a j a la k, (3.11) ϕ ϕ j ˆD ϕ k ϕ l AS ϕ ϕ j ˆD( ϕ k ϕ l ϕ k ϕ l ). (3.12) A dervaton of the above denttes can be found n most texts coverng many body quantum mechancs, and a good reference s Shavtt (2009) [33]. The evaluaton of expectaton values of products of operators are often made much smpler by fndng the so called normal ordered form. We wll make use of normal ordered operators later, and we wll therefore spend the rest of ths secton to defne normal orderng. A product of operators are sad to be normal ordered when all creaton operators are to the 20

21 left of all annhlaton operators. As an example, consder the operators Â, ˆB, Ĉ whch are products of annhlaton and creaton operators The normal orderng of these operators are Â = a a1 a a2, (3.13) ˆB = a b1 a b2, (3.14) Ĉ = a c1 a c2. (3.15) {Â ˆBĈ} = ( 1)p a a1 a b1 a c1 a a2a b2 a c2. (3.16) Ths s a reorderng where the annhlaton operators are set to the rght of the creaton operators, and p s the number of permutatons necessary to reorder the operators. An nterchange of two creaton or two annhlaton operators would not destroy the normal order, thus the normal orderng s not unque. 3.3 The tme ndependent Wck s theorem Wck s theorem provdes an effcent way of calculatng expectaton values. We wll state ths theorem wthout a proof, but the full proof can be found n many texts as for example Gross et. al. (1991) [9]. Before we state the theorem we must ntroduce contractons. A contracton of two creaton or annhlaton operators Â, ˆB s defned as Â ˆB = Â ˆB {Â ˆB}. (3.17) Both Â and ˆB are ether a creaton or an annhlaton operator, whch means that there are four possble combnatons a a j = a a j a a j = 0, (3.18) a a j = a a j a a j = 0, (3.19) a a j = a a j a a j = 0, (3.20) a a j = a a j ( )a a j = {a, a j } = δ j. (3.21) Further more, contracton of operators wthn a normal product can be wrtten {Â... ˆB... Ĉ... ˆD... Ê... ˆF } = ( 1) p ˆB ˆDĈÊ... {Â... ˆF }, (3.22) where p s the number of permutatons whch are needed to move the contracted operators n front of the normal ordered product. Wck s theorem says that a product of creaton and annhlaton operators s equvalent to the normal ordered product plus the sum of the normal ordered products wth all possble contractons. Â ˆB... = {Â ˆB... } + all sngle c. {Â ˆB } + {Â ˆB } +... (3.23) all double c. We can use Wck s theorem to rewrte operators as a sum of normal ordered products. Ths s a convenent form snce vacuum expectaton values {Â... } always are zero. 21

22 3.4 Partcle- hole formalsm Untl now, we have used the vacuum state as the reference state. It s however often natural to use other states as the reference state n whch case t s convenent to use the so called partcle-hole formalsm. Mathematcally, the partcle hole formalsm s very smlar to the vacuum reference formalsm, but some mathematcal concepts, lke normal orderng of the operators and contractons, have to be redefned. Assume that we want to use the state c = a c1... a cn, where N s the number of partcles, as our new vacuum state, and defne all other states relatve to ths state. The states { ϕ c } N =1 are now referred to as hole states whle all other states are referred to as partcle states. In the rest of ths secton we wll let ndces, j, k, l symbolze hole states and ndces a, b, c, d symbolze partcle states. Now, any state can be wrtten as an exctaton of the reference state c ab... j... a aa b... a ja c. (3.24) The creaton operator a s sometmes called a pseudo annhlaton operator snce t annhlates a hole. Lkewse, an annhlaton operator of a hole state, a, can be called a pseudo creaton operator snce t creates a hole. The full set of psuedo operators are defned The exted states are now wrtten b a = a a b a = a a } (acts on partcle states), b = a b = a } (acts on hole states). (3.25) c ab... j... b ab b... b j b c. (3.26) Wth ths new set of operators, the antcommutaton relatons, contractons and Wck s theorem wll have exactly the same defnton as n the vacuum reference formalsm. 22

23 Chapter 4 Mathematcal Modellng of two dmensonal Quantum dots 4.1 Introducton In the most general sense of the term, a quantum dot s a small quantum system whch s confned n space. But the term s most often used to descrbe electronc systems that are trapped n semconductor structures. Durng the last few decades, advanced processng technques have made t possble to manufacture artfcal quantum dots whch are trapped n one or two spatal dmensons, and confnement to less than three dmensons was expermentally verfed already n the early 1970s n GaAs-A1GaAs semconductors [25]. Confnement by potentals set up by electrostatc gates also allows expermentalsts to control ther shape and sze and the number of electrons whch ranges from one to hundreds [26]. Our dealzed quantum dots are modelled as two dmensonal systems, whch can be justfed both theoretcally and expermentally. As we wll come back to later, our quantum dot wave functon s separable n the spatal dmensons. Consequently, f the component perpendcular to the semconductor layers are always n the ground state, ths component can then be left out of the descrpton. We wll not dscuss the real quantum dots any further, but wll from now on concentrate on our dealzed mathematcal model. For more nformaton on quantum dots we refer to the revew artcle of Ref.[26], whch dscusses both numercal and expermental results. 4.2 The Hamltonan The two dmensonal quantum dots can be descrbed by a Hamltonan operator on the form Ĥ = Ĥ0 + ˆV, (4.1) where Ĥ0 s an one-body operator whch accounts for the sngle partcle knetc energes and the nteracton wth an external electromagnetc feld, and ˆV s a two-body operator whch accounts for the Coulomb nteracton between the electrons. The sngle partcle 23

24 Hamltonan can be wrtten on the general form Ĥ 0 H 0 (r 1,..., r N, p 1,..., p N ) = ( 1 m [p e 2 c A(r )] + V ext (r )), (4.2) where A(r) s the vector potental of an external magnetc feld, and (r, p ) s the poston and momentum coordnates. The magnetc moment of the electrons s gnored, thus the spn-spn and the spn-orbt couplngs are assumed to be neglgble. We also assume that there s no external magnetc feld and set the vector potental A(r) to zero. The external electrostatc feld V ext (r) s taken to be a parabolc potental. Ths s a much used approxmaton n theoretcal physcs, snce the shape of any conservatve potental s approxmately parabolc close to the mnma. Wth these assumptons, the one-body part of the Hamltonan becomes Ĥ H0 = ˆT + ˆV ext, (4.3) ˆT 1 m p = h 2 2 2m, (4.4) ˆV ext mω2 2 R2, R = (r 1,..., r N ), (4.5) where ω s the oscllator frequency, h s the reduced Planck constant and m s the electron mass. To smplfy the expressons we choose to measure energes n unts of hω and lengths n the unts of ( h/(mω)) 1/2. The sngle partcle Hamltonan can then be wrtten on the dmensonless form and the Coulomb operator can be wrtten Ĥ HO (r) = 1 2 r2 2 r 2, (4.6) λ ˆV V (r, r j ) =, r j = r r j, λ = 1 <j <j r j hω ( e2 4πɛ 0 ɛ ). (4.7) 4.3 The sngle partcle wave functons We wll frst take a look at the egenstates of the sngle partcle Hamltonan Ĥ0 whch we wll refer to as spn orbtals or smply as the sngle partcle wave functons. We wll later use these to construct the many partcle bass usng the same strategy as we dscussed n the prevous chapters. Snce the Hamltonan has no spn couplng, the electron spns does not affect the descrpton of the sngle electrons. Therefore, we wll leave them out for now and explctly nclude them n the mathematcal descrpton n the next secton where we dscuss the many partcle bass. We denote the sngle partcle wave functons {ϕ 1, ϕ 2,... }. These are found by frst notng that the sngle partcle Hamltonan can be wrtten as a sum Ĥ HO = 1 2 (x 2 δ2 δx 2 ) (y2 δ2 δy 2 ), r = (x, y ). (4.8) 24

25 Fgure 4.1: Illustraton of the sngle partcle bass, and an example of how the spn orbtals can be ndexed. Each arrow corresponds to a spn +1/2 state, and each corresponds to a spn 1/2 state. Each vertcal lne corresponds to an orbtal wth a gven magnetc quantum number m and prncpal quantum number n. The numbers next to the arrows corresponds to the ndex of the spn orbtal. For example, ϕ 0 has spn 1/2, m = n = 0 and ϕ 10 has spn +1/2 and n = 0, m = 2. Thus the soluton s separable, and can be wrtten as a product of functons that depends only on one of the varables {x } or {y }. These are well known equatons whch descrbe the behavour of the so called quantum harmonc oscllator [31], and the solutons can be represented n many ways. Because of the rotatonal symmetry of the quantum dots, t s often convenent to represent the sngle partcle wave functons as the egenfunctons of the angular momentum operator ˆL Z whch can be shown to commute wth ĤH0. In two dmensons and n polar coordnates ϕ can be wrtten 1/2 2n! 1 ϕ (r) ϕ n,m (r, θ) = [ (n + m )! ] 2π emθ r m Ln m (r 2 )e r2 /2, r = r, (4.9) whch are the so called Fock- Darwn orbtals, and where L m are the Laguerre polynomals. Each s mapped to a unque par of quantum numbers (n, m), where m s the magnetc quantum number and n s the prncpal quantum number n 0, 1, 2,..., m { n, n + 2,..., n}. (4.10) The sngle partcle energy can be shown to be n unts of hω. dr rdθ ϕ n,m(r, θ)h HO (r, θ)ϕ n,m (r, θ) = 2n + m + 1, (4.11) 4.4 The many body wave functons The many body wave functons are constructed from the harmonc oscllator egenfunctons. But to descrbe the many partcle system, we must also nclude spn. Each sngle partcle 25

26 wave functon has a trplet of quantum numbers (m, n, s) whch unquely specfes the quantum state. ϕ = ϕ m,n χ s, (4.12) where χ s s the spnor. The N partcle wave functon s the determnant D = ϕ 1,..., ϕ N 1 N! det (ϕ 1,..., ϕ N ), (4.13) whch we wll wrte n second quantzed form as D = a 1... a N. (4.14) 4.5 The Normal ordered Hamltonan To smplfy the calculaton of dfferent expectaton values, we want to express the Hamltonan on normal ordered form n the partcle-hole formalsm. Our Hamltonan can be wrtten on the general form Ĥ = ĤHO + ˆV = ϕ p ĤHO ϕ q a pa q + 1 pq 4 pqrs ϕ p ϕ q ˆV ϕ r ϕ s AS a pa qa s a r. (4.15) We wll now rewrte the Hamltonan n normal ordered form n the partcle-hole formalsm. We defne the contractons where D α can be any state We use the conventon that the ndces can represent any sngle partcle wave functon whle a pa q = D α a pa q {a p, a q } D α, (4.16) D α = a α 1 a α 2... a α N. (4.17) p, q, r, s [0, 2M], (4.18), j, k, l {α 1, α 2,..., α N }, a, b, c, d {α 1, α 2,..., α N }. (4.19) Here, 2M s the total number of spn orbtals, and {α 1, α 2,..., α N } are the ndces of the occuped orbtals. Note that we do not necessarly take, j, k, l to be states below the Ferm level. Accordng to Wck s theorem and usng the fact that ϕ p are the egenfunctons of ĤHO we arrve at the expresson Ĥ HO = ϕ p ĤHO ϕ q ({a pa q } + {a pa q }) = ϕ p ĤHO ϕ p {a pa p } + ϕ ĤHO ϕ. pq p (4.20) 26

27 The Coulomb energy operator ˆV s also rewrtten on normal form usng Wck s theorem. Only the nonzero contractons are ncluded n the dervaton ˆV = 1 4 ϕ p ϕ q ˆV ϕ r ϕ s AS ({a pa qa s a r } + {a pa qa s a r } + {a pa qa s a r } pqrs + {a pa qa s a r } + {a pa qa s a r } + {a pa qa s a r } + {a pa qa s a r }) (4.21) = 1 4 ϕ p ϕ q ˆV ϕ r ϕ s AS {a pa qa s a r } + ϕ p ϕ ˆV ϕ q ϕ AS {a pa q } pqrs pq ϕ ϕ j ˆV ϕ ϕ j AS. (4.22) j The total Hamltonan can now be wrtten Ĥ = ϕ p ĤHO ϕ p {a pa p } + 1 p 4 pqrs ϕ p ϕ q ˆV ϕ r ϕ s AS {a pa qa s a r } + ϕ p ϕ ˆV ϕ q ϕ AS {a pa q } + E α (4.23) pq E α = ϕ ĤHO ϕ ϕ ϕ j ˆV ϕ ϕ j AS. (4.24) j 4.6 The Hamltonan matrx elements We wll now use the normal ordered Hamltonan to fnd the general expressons for the Hamltonan matrx elements D α Ĥ D β. From Eq. (4.24) we see that only states that dffers by less than three orbtals are connected whch means that all non zero matrx elements can be wrtten ether as a dagonal element as the ampltude of a sngle exctaton or as the ampltude of a double exctaton D α Ĥ D α, (4.25) D α Ĥ{a aa } D α, (4.26) D α Ĥ{a aa b a a j } D α. (4.27) Here we have used the same conventons for the ndexng as n Eq. (4.19), wth D α as the reference state. The closed form expressons of the matrx elements are calculated by nsertng the normal ordered Hamltonan (4.24). For the dagonal elements we obtan the expresson D α Ĥ D α = ϕ ĤHO ϕ ϕ ϕ j ˆV ϕ ϕ j AS. (4.28) j The sngle exctatons have the ampltude D α Ĥ{a aa } D α = ϕ p ϕ j ˆV ϕ q ϕ j AS {a pa q }{a aa } = ϕ a ϕ j ˆV ϕ ϕ j AS, (4.29) pqj j 27

28 and the double exctatons have the ampltude D α Ĥ{a aa b a a j } D α = 1 4 ϕ p ϕ q ˆV ϕ r ϕ s AS pqrs ({a pa qa s a r }{a aa b a a j } + {a pa qa s a r }{a aa b a a j } +{a pa qa s a r }{a aa b a a j } + {a pa qa s a r }{a aa b a a j }) = ϕ a ϕ b ˆV ϕ ϕ j AS. (4.30) The sngle partcle energes ϕ ĤHO ϕ are calculated usng Eq. (4.11) whle the antsymmetrc Coulomb matrx elements ϕ ϕ j ˆV ϕ ϕ j AS are defned as the ntegrals ϕ p ϕ q ˆV ϕ r ϕ s AS = I pqrs χ sp χ sr χ sq χ ss I pqsr χ sp χ ss χ sq χ sr, (4.31) I klpq 1 4 dr 1 dr 2 ϕ k (r 1)ϕ l (r 2)V (r 1, r 2 )ϕ p (r 1 )ϕ q (r 2 ). (4.32) where the spnors are orthonormal and obey the relaton χ s1 χ s2 = δ s1 s 2. Kvaal (2008) has shown how the ntegrals I klpq can be effcently calculated n Ref. [15]. In our mplementaton, we have used use the open source C++ lbrary OpenFCI, whch s descrbed n the same artcle, to calculate these ntegrals. 28

29 Part II Numercal many body methods 29

30 Quantum many body systems are notorously dffcult to solve, and only the smplest systems, such as the two partcle quanum dot, can be solved on a closed form. More complex systems are always solved usng numercal methods, and n most cases some approxmaton or smplfcaton must be done to reduce the computatonal cost to a realstc level. Typcal examples of such approxmatons are bass truncatons, meanng that the only a subspace of the full state space s used. Another example s the fxed node approxmaton of Dffuson Monte Carlo that we wll dscuss later n ths chapter. In ths part we wll take a closer look at three classes of many body methods, namely Full Confguraton Interacton Theory (FCI), Projector Monte Carlo (PMC) methods and Hartree-Fock (HF) methods. These use dfferent approaches to calculate ground state values of dfferent many body systems, and have dfferent advantages and weaknesses. The reason why we want to present FCI and PMC s that the Full Confguraton Interacton Quantum Monte Carlo Method (FCIQMC), whch we wll present n the next part, can be seen as a hybrd between these approaches. It shares some of the strengths and weaknesses of both methods, and t s therefore nstructve to take a closer look at these frst. The HF method s ntroduced because t provdes a method to untarly transform the spn orbtals to a more favourable bass, whch we hope wll speed up the convergence of our smulatons. 30

31 Chapter 5 The Full Confguraton Interacton method 5.1 Introducton to the method In ths secton we wll ntroduce a numercal method whch s called Full Confguraton Interacton Theory (FCI). Ths s the conceptually smplest of the many body methods and yelds exact solutons to the statonary Schrödnger equaton wthn a gven subspace of the full state space. To be more specfc, FCI solves the egen problem ˆP R Ĥ ˆP R Ψ = ε Ψ, (5.1) where ˆP R s a projecton operator 1 whch projects the full Hlbert space onto a subspace whch we call the FCI space. These subspaces can be defned n several ways, and two commonly used FCI spaces are the energy cut space P R and the drect product space M R, whch n our case are defned as P R = span { D ; D ĤHO D R + 1}, (5.2) M R = span { D = ϕ n1 ϕ n2... ϕ nn ; ϕ n ĤHO ϕ n R + 1}, (5.3) where ϕ are the harmonc oscllator egenfunctons as defned earler. Note that t s not clear whch of these truncatons gve the most accurate result n terms of the dmensonalty of the subspace. Although t can be argued that more physcal relevant states are ncluded n the energy cut spaces, t s not obvous whch of the truncatons that s the most favourable [14]. Accordng to the varatonal prncple, solvng the egen problem Eq. (5.1) s analogous to solvng the mnmzaton problem [8] E(R) = mn( Ψ ˆP R Ĥ ˆP R Ψ ). (5.4) Ths has two mportant mplcatons. The frst s that E(R 1 ) E(R 2 ) when R 1 > R 2, such that the FCI ground state energy s ether mproved or unchanged when we ncrease the sze of the FCI space. Thus the FCI ground state energy can be systematcally mproved by 1 The projecton ˆP onto the subspace span{ ϕ } s defned as ˆP = ϕ ϕ. Note that ˆP = ˆP = ˆP ˆP. The projecton onto the excluded or truncated space ˆQ = 1 ˆP has the same propertes. 31

Fgure 5.1: Modelspaces P 6, M 6 and P 12 for quantumdots wth N P = 1 partcle. Here P N and M N s the energy cut space and the drect product space respectvely, truncated on N + 1 shells.

And secondly, the method s varatonal, such that any FCI energy sets an upper bound to the energes of a system.

32 Fgure 5.1: Modelspaces P 6, M 6 and P 12 for quantumdots wth N P = 1 partcle. Here P N and M N s the energy cut space and the drect product space respectvely, truncated on N + 1 shells. Note that P 6 M 6 M 12 and that n general P N M N M NNP for all N P ncreasng the number of shells n the truncated bass whch makes t possble to check f the energes have converged. And secondly, the method s varatonal, such that any FCI energy sets an upper bound to the energes of a system. Varatonal methods makes t possble to do a systematc search for lower ground state energes, knowng that the lower result always s the better one. Implementatons of the FCI algorthm nvolves large scale dagonalzaton of the Hamltonan matrx H j = D ˆP R Ĥ ˆP R D j. (5.5) The most common method s to use the Lanczos algorthm [8], whch requres at least two vectors of length equal to the dmenson of the FCI space to be stored. The problem s that the dmensonalty of the FCI spaces scales as ( M ), (5.6) N where N s the number of partcles and M s the number of spn orbtals n the bass [12]. Ths lmts the FCI algorthm to small systems wth only a few partcles or crude approxmatons wth a low R. As an example we look at the 6 partcle quantum dot n 20 shells. The bass s fltered such that only the states wth m = s = 0 are ncluded. The fltered bass has a drect product space wth dmensonalty dm(m 19 ) (see Tab. 5.1) whch s already at the absolute upper lmt of what an FCI mplementaton can tackle on a modern super computer [20]. 5.2 Error analyss of the FCI energes Kvaal (2009)[16] has analyzed the truncaton error of the FCI method wth the harmonc oscllator bass n energy cut spaces and wth Coulomb partcle- partcle nteractons. A 32

33 R dm(m R ) Table 5.1: Ths table shows the dmenson of fltered FCI spaces wth dfferent R for a sx partcle quantum dot. By fltered we mean that only the physcally relevant states wth m = s = 0 are ncluded. At R = 20, the dmensonalty s already at the upper lmt of what s possble to smulate wth FCI on hgh level supercomputng facltes. It s worth mentonng that the FCI energy at R = 19 shells wth the harmonc oscllator bass s 20.17H, approxmately 10mH hgher that the true ground state energy at approxmately 20.16H. These results are lsted n chapter. parametrzaton of the approxmate error of the energy as functon of R s also derved. Wthout gong nto detals, we wll present Kvaals error estmate. Kvaal has shown that the followng relaton holds for two dmensonal systems E(R) [1 + ν(r)] r=r+1 (N P + r) r c, (5.7) where E(R) s the error n the energy E( ) E(R) resultng from the bass truncaton, c s a real constant, ν(r) s an unknown functon wth the property ν(r) 0, (5.8) R and N P s the number of partcles. Kvaal has analyzed the error n the energy cut bass, but hs results wll also apply to FCI energes calculated n the drect product bass. The reason for ths s that the drect product bass energes are bounded from below and above by energy cut energes snce P R M R P NP R. (5.9) In other words, the drect product bass error wll be somewhere n the range [ E(R), E(N P R)]. 33

34 The Hlbert spaces M R and P R s much closer n dmensonalty than M R and P NP R. We wll therefore assume that we can use the same error formula for the spaces M R and P R. As we wll see later, ths formula seems to hold very well for the systems we are studyng. We wll also assume that ν(r) s neglgble and rewrte Eq. (5.7) on the form E b r=r+1 (N P + r) r c, b R. (5.10) Ths assumpton s justfed by Kvaal s [16] results. He has performed FCI calculatons on two dmensonal quantum dots n the energy cut bass wth 3, 4 and 5 partcles, and for these systems ν(r) s shown to be small. Also, snce P R M R, we expect ν(r) to decay even faster n the drect product bass. 5.3 Extrapolaton Formulas We have used Eq. (5.10) to parametrze the energy as a functon of the number of shells. R E(R) a b (N P + r)r c, (5.11) r=1 where a, b, c are real constants. Ths formula has an error E(R)ν(R), and consequently t wll become more accurate as R ncreases and ν(r) becomes small. Ths means that we should avod extrapolatons wth energes ˆP R Ĥ ˆP R for small R. We also want to fnd the extrapolated energy at R gven that we know the optmal values of a, b, c. We notce that E( ) can be wrtten n terms of the Remann zeta functon ζ(c) such that lm E(R) a b[n P ζ(c) + ζ(c 1)], (5.12) R where ζ(c) = r=1 r c, whch s known to converge for all c > 1. 34

35 Chapter 6 Projector Monte Carlo Methods 6.1 Introducton The projector Monte Carlo (PMC) Methods ncludes a range of methods whch are talored to calculate propertes of low energy states of quantum mechancal systems, and ncludes algorthms lke Dffuson Monte Carlo (DMC), Green s Functon Monte Carlo and the recently developed Full Confguraton Interacton Quantum Monte Carlo (FCIQMC). These methods are all based on the same theoretcal foundaton, namely that the so called projecton operator ˆP t, wll project out the lowest energy state n the Hlbert space. The projecton operator s defned as ˆP t = e t(ĥ S), S R, (6.1) where S s a real shft of the energy whch we wll soon come back to, and the projected state s Ψ(t) = ˆP t Ψ 0. (6.2) Assumed that the ntal state Ψ 0 has a nonzero overlap wth the ground state, the large tme Ψ(t) wll be proportonal to the ground state. To see why ths s so we consder the general wave functon Ψ = c ψ, (6.3) where { ψ } s the egenfunctons of the Hamltonan Ĥ wth the correspondng egenvalues {E }. The projected state can now be wrtten Ψ(t) = ˆP t Ψ = c ψ e t(e s). (6.4) The vector c ψ e t(e S) wll decay to zero f E > S, wll be constant f E = S and wll ncrease exponentally f E < S. In the case that S = E 0 and the overlap between the ntal state and the ground state c 0 s nonzero, t s evdent that lm ˆP t Ψ = lm t t e (E S)t c ψ c 0 ψ 0. (6.5) 35

36 Also note that n the case of a S ε 0, the ground state wll stll domnate snce the ground state always has the greatest exponent. The basc dea of PMC algorthms s to solve the above equaton by expressng the ground state as Markov chan, and to fnd the steady state soluton teratvely. Ths can be done by notng that ˆP nτ = e (Ĥ S)nτ = ( ˆP τ ) n, (6.6) where τ s a small tme step and n s a postve nteger. By repeatedly multplyng wth the operator ˆP τ we arrve at the large t soluton lm ( ˆP τ ) n Ψ ψ 0. (6.7) n From ths equaton we see that we can express the tme evoluton of the projected state as a Markov chan Ψ (n+1) ˆP τ Ψ (n), where lm n Ψ (n) k ψ 0, k C. (6.8) where the steady state soluton s the ground state. Ths equaton s the startng pont for all PMC algorthms. In general, the dfferent PMC algorthms utlze dfferent methods to approxmate the projecton operator. For example n FCIQMC, the projecton operator s approxmated as a frst order Taylor expanson, whle n DMC t s approxmated as a Green s functon. We wll look closer at two Projector Monte Carlo algorthms, namely DMC and FCIQMC. We wll frst look at DMC and dscuss some of the advantages and shortcomngs of ths algorthm. Ths wll also serve as an ntroducton to the next part where we gve a thorough presentaton of the FCIQMC algorthm. 6.2 Dffuson Monte Carlo In ths secton we wll explan the bascs of the DMC algorthm and also dscuss the man dffcultes and sources of error The projecton operator and the short tme approxmaton DMC fnds the projected state by solvng the ntegral where G s the Green s functon Ψ(R, t) = G(R, R, t)ψ(r )dr, (6.9) G(R, R, t) = R e t(ĥ S) R = e t(e S) ψ (R )ψ (R), Ψ(R) = R Ψ. (6.10) Because the egenfunctons ψ are orthogonal t s straght forward to show that ψ (R)ψ j (R)dR = ψ ψ j = δ j, (6.11) G(R, R, t)ψ(r)dr = c ψ (R)e t(ε S), (6.12) 36

37 whch s the defnton of the projected state n Eq. (6.4). Ths ntegral can be sampled drectly usng Monte Carlo technques, but ths has proven to be an neffcent approach, manly because the Green s functon contans sngulartes that makes the smulatons very unstable [10]. It s known to be a more effcent approach to use a modfed Green s functon and to solve the ntegral where f(r, t) = G(R, R, t)f(r, 0)dR, (6.13) f(r, t) = Ψ T (R)Ψ(R, t), (6.14) G(R, R, t) = Ψ T (R)G(R, R 1, t) Ψ T (R ). (6.15) The tral wave functon Ψ T must be chosen as a wave functon whch closely resembles the ground state for ths approach to be effcent. As we see, Ψ(R, t) has exactly the same tme dependency as before. By assumng that the Green s functon s constant durng a tme nterval τ, t can be shown that t can be wrtten on the form [39] G(R 1, R 2, τ) =Me 1 2τ R 2 R F(R 1) 2 e τ[ E L (R 2 )+E L (R 1 ) 2 S] + O(τ 2 ). (6.16) where M s a normalzaton factor. Ths s known as the short tme approxmaton and s, as we wll dscuss later, one of two man sources of numercal error. The frst factor n eq. (6.16) s a Gaussan dstrbuton where F s the so called quantum force N (R 2, R 1, τ) = e 1 2τ R 2 R F(R 1) 2, (6.17) The second factor s the so called the weght functon where E L s the local energy F(R) = 2 Ψ T (R) Ψ T (R). (6.18) W(R 2, R 1, τ) = e τ[ E L (R 2 )+E L (R 1 ) E 2 T ], (6.19) E L (R) = ĤΨ(R) Ψ(R). (6.20) The tral wave functon s constructed such that E L has no sngulartes, typcally by ntroducng the so called Jastrow factor, snce ths reduces the numercal nstabltes of the DMC algorthm. 37

38 6.2.2 The Dffuson Monte Carlo Algorthm In DMC, the functon f(r, t) s represented as a dstrbuton of walkers wth the coordnates {R } N, such that =1 f(r, t) dr 1 N N =1 δ(r R ), (6.21) whch s only exact n the lmt N. The walkers follow a set of dynamcal rules whch are derved from Eqs. (6.16), (6.13), and wll converge to a dstrbuton whch s proportonal to Ψ T (R)ψ 0 (R). We wll now summarze the DMC algorthm n a few steps to llustrate how t works. (): Frst, the ntal dstrbuton s chosen to be f(r, 0) = Ψ(R) 2, whch should be close to the ground state. Such dstrbutons are typcally obtaned usng varatonal technques lke Varatonal Monte Carlo [10]. A number of N samples (or walkers) {R 0 }N are drawn from ths dstrbuton. =1 (): Durng a sngle tme step, f evolves accordng to the equaton f(r, τ(n + 1)) = dr G(R, R, τ)f(r, τn) = dr N (R, R, τ)w(r, R, τ)f(r, τn). (6.22) In the lmt of a large number of walkers, f(r, τ(n + 1)) s proportonal to the dstrbuton {W(R n+1, R n, τ)} N =1, (6.23) where R n+1 s sampled from the probablty dstrbuton N (R, R n). Instead of storng the weghts W each walker s ether cloned or removed from the smulaton accordng to the branchng rule: Calculate the number M = floor(w(r n+1, R n, τ) + (U 1/2)) where U s a random unform between 0 and 1. If M = 0 the th walker s removed from the smulaton. If M > 1, M 1 new walkers are added to the smulaton n the same coordnate as the th walker. In the lmt of a large number of walkers, ths yelds the correct results snce for any number a. floor(a + (U 1/2) = a, (6.24) (): The shft S should be equal to E 0 to project out the ground state. But snce E 0 s unknown S s adjusted to stablze the populaton of walkers. From Eq. (6.4) we see that a large S results n a populaton growth and vce versa, and that the S that keeps the number of walkers constant s the ground state energy. For ths purpose t s common to use an equaton on the form [10] S S k τ log(n W N P ), (6.25) 38

39 where N W s the current number of walkers, N P s the desred number of walkers and k s some real number whch should be chosen such that the fluctuatons of S are mnmzed. (v): Steps (), () are repeated a number of tmes to thermalze the dstrbuton. After ths step, the dstrbuton should be Ψ T (R)ψ 0 (R). (v): When the thermalzaton s fnshed, one starts collectng observables. As an example, the energy can be found by takng the average of the shft S durng a smulaton. (v): Steps (), () and (v) are repeated untl a suffcent number of samples are collected. There are many detals and possble optmzatons that are left out here, but a thorough dscusson of ths algorthm can be found n Res. [39] Systematc errors DMC s subject to a tme step error whch stems from the short tme approxmaton of the Green s functon, but ths problem can often be overcome by extrapolatng the results to the lmt τ 0. More severe s the so called sgn problem, whch s the man dffculty of DMC and PMC methods n general, and whch s connected to the antsymmetry of fermonc wave functons. As we remember, fermonc wave functons wth dentcal partcles must be antsymmetrc wth regard to the exchange of two partcles, whch means that such wave functons has regons of both negatve and postve sgn. These regons are separated by the nodal surfaces where ψ 0 = 0 and changes ts sgn (has a nonzero gradent). In fact, the product f = Ψ T ψ 0 should be negatve n the regons where Ψ T and ψ 0 have dfferent sgns. From our nterpretaton of f as a dstrbuton of walkers, t s restrcted to be postve everywhere, meanng that our algorthm only can be exact n the case where Ψ T and ψ 0 have exactly the same nodal surfaces or ψ 0 s postve everywhere (bosonc systems). The frst to propose a method to deal wth the sgn problem was Anderson (1975) [1] who set W = 0 (removng the walker) for any step crossng a node of the tral wave functon, thus mposng the nodal surfaces of the tral functon. By usng the fxed node approxmaton, the assumpton s made that Ψ T has the same nodes as ψ 0. Ths s the only fundamental assumpton made n the theory behnd DMC, and f the exact nodes ψ 0 where known, DMC would n prncple be exact for systems of dentcal fermons. The fxed node approxmaton s shown to produce energes that are hgher than the ground state [27], thus the method s varatonal. However, snce there are no systematc methods of mprovng the nodes, the method s not systematcally mprovable and ths means that t s dffcult to control the fxed node error. 39

40 Chapter 7 The Hartree-Fock method 7.1 Overvew The Hartree- Fock (HF) method [12], s an approxmate method to solve the Schrödnger equaton, and can be seen as a mean feld approxmaton where the nteractons between the partcles are replaced by an averaged nteracton. The result s that the many partcle problem s effectvely reduced to a sngle partcle problem where each partcle only sees the mean feld. The HF method s numercally cheap, but snce the method ncludes only selected correlatons, t does n general provde less accurate results than FCI or the DMC. Formally, the HF ansatz assumes that the wave functon can be wrtten as a sngle determnant Ψ HF = ϕ 1,..., ϕ NP, (7.1) where N P s the number of partcles n the system. The HF determnant Ψ HF s the determnant that solves the equaton E HF = mn [ Ψ HF Ĥ Ψ HF ]. (7.2) Accordng to the varatonal prncple, ths s the sngle determnant wth the energy that s closest to the ground state energy. In practce, the HF determnant can be found by varyng the sngle partcle bass {ϕ j } M =1 ϕ = C j ϕ j, (7.3) j where the bass { ϕ j } M =1 that solves Eq. (7.2) s the so called HF bass, and C j s untary n the sense that CjC jk = 1. (7.4) j Note that untarty assures that expectaton values and the normalzaton of the wave functon are conserved. The reason why we are nterested n the HF method s that t can be used to optmze the sngle partcle bass. The HF bass yelds a better startng pont for other numercal methods compared wth the harmonc oscllator bass. The HF bass s routnely used together wth Dffuson Monte Carlo [22], Coupled Cluster Theory [24], Full Confguraton Interacton Quantum Monte Carlo [3] as well as other numercal many body methods. 40

41 7.2 The Roothaan-Hartree-Fock equatons In ths secton we wll take a closer look at how the mnmzaton of the sngle determnant energy can be performed numercally. We wll see that the most costly operaton s the dagonalzaton of an M M matrx, where M s the number of sngle partcle orbtals n the bass. Note that M never becomes very large, and that even for smulatons wth 30 shells, M does not exceed For a Hamltonan on the form Ĥ = ĤHO + ˆV, where ĤHO s an one-body operator and ˆV s a two-body operator, t can be shown that the mnmmzaton problem of Eq. (7.2) can be wrtten as an egenvalue problem [30] Hαγ HF C κγ = e k C κα, (7.5) γ H HF αγ = ϕ α ĤHO ϕ γ + N P α=1 whch can be expressed as a matrx equaton Cαβ C αδ ϕ α ϕ β ˆV ϕ γ ϕ δ AS, (7.6) βδ H HF C = Ce, (7.7) where e s a dagonal matrx and C and H HF are matrces wth elements C j and Hj HF. Ths equaton s a non lnear egenvalue problem that must be solved teratvely. The columns of C and the dagonal elements of e are smply the egenvectors and the egenvalues of the matrx H HF, and can thus be found by dagonalzng H HF. However, H HF s dependent on the elements of C, and must be recalculated every tme C s updated. Ths teratve procedure wll converge to the self consstent soluton when certan stablty crtera are fulflled. The convergence propertes are dscussed n Ref. [12]. 41

42 Part III Full Confguraton Interacton Quantum Monte Carlo 42

43 Full Confguraton Interacton Quantum Monte Carlo (FCIQMC) s an ab nto method whch s rooted n basc quantum mechancal prncples wthout makng any assumptons or approxmatons. The FCIQMC algorthm promses to calculate the exact Full Confguraton Interacton (FCI) value wthn a gven Hlbert space, but usually at a much lower computatonal cost than FCI methods. Smlar to the approach used n DMC, the ground state s found by calculatng the projected state ˆP t Ψ, but the ntegraton s not performed n the coordnate space but n the dscrete space of Slater determnants. Because of the nherent ant- symmetrc propertes of ths bass, the nfamous sgn problem s not as severe for FCIQMC as for other projector methods. See for example Ref. [13] for a thorough analyss of ths subject. The algorthm was frst made famous by Booth et. al. (2009) [3]. They were able to fnd the FCI values for dfferent molucules n very large Hlbert spaces, lke for example the NaH molecule wth an FCI bass of more than determnants. Although these results are mpressve and far out of the scope of FCI, the basc FCIQMC algorthm s lmted due to the unfavourable scalng wth ncreased bass szes. In 2010, an mprovement to the algorthm was proposed by Cleland et. al. [4]. The mproved algorthm, dubbed -FCIQMC, made t possble to do smulatons wth Hlbert spaces vastly larger than the orgnal algorthm could handle. A seres of artcles have snce then been publshed where the new method s appled to dfferent systems, and produce what are beleved to be nearly unbased results. It s worth mentonng the artcle by Shepherd et. al. (2012) [34], where ground state values of the homogenous electron gas wth a Hlbert space wth determnants are calculated. In the followng chapters, we am to gve a thorough ntroducton to the FCIQMC algorthm and the optmzed algorthm -FCIQMC. 43

44 Chapter 8 The FCIQMC algorthm In ths chapter wll use the followng notaton. The general wave functon s wrtten as Ψ = v ψ, (8.1) where {v } are complex coeffcents and {ψ } s the egenfunctons of the Hamltonan wth the correspondng egenvalues {E }. The ground state s expanded n the determnants { D }, whch s constructed from the spn orbtals {ϕ }. ψ 0 = C D, (8.2) the CI coeffcents of the ground state wave functon satsfy the statonary Schrödnger equaton n terms of the egenvalue problem where E 0 s the lowest egenvalue of Ĥ on the FCI bass. 8.1 The mathematcal approach D Ĥ D j C j = E 0 C, (8.3) j From the last chapter we remember that the projected state can be found by teratng over the equaton Ψ (n) ˆP τ Ψ (n), (8.4) where lm n Ψ (n) s proportonal to the ground state ψ 0 f the overlap Ψ (0) ψ 0 s dfferent from zero and the energy shft S of the projecton operator ˆP τ = e τ(ĥ S) s equal to the ground state value E 0. Our frst am s to formulate ths equaton n the FCI bass and n terms of the coeffcents C. The projecton operator s approxmated as the frst order Taylor expanson ˆP τ = 1 (Ĥ S)τ + O([Ĥ S]2 τ 2 ). (8.5) We now apply ths operator on the ntal state expressed on the FCI bass Ψ (n) C (n) D, (8.6) 44

45 and end up wth the expresson C (n+1) D (1 (Ĥ S)τ) C (n+1) C (n) = C (n) (H j j S)τC (n) j j (H S)τC (n) C (n) D (8.7) H j τc (n) j. (8.8) j Here we have used H j = D Ĥ D j. We wll from now on omt the ndex n n our equatons. A DMC nspred approach s used to fnd the steady state soluton of Eq. (8.8). The coeffcents {C } are represented by a populaton of N W walkers α whch are dstrbuted on the determnants D α and wth a sgn s α = ±. The ampltude C s then defned to be C n = s α δ α, (8.9) α whch s a sgned sum where two walkers of opposte sgns on the same determnant wll make zero contrbuton, and the number of walkers s N W = n. (8.10) Smlar to DMC, we defne a set of dynamcal rules of such a nature that the walkers converge to a dstrbuton n C. These dynamcal rules are derved from the two last terms of the rght hand sde of Eq. (8.8) whch descrbes the rate of whch C changes. The frst of these therms s p d ( ) = τ(h S), (8.11) where S s the real shft from the projecton operator and τ s the tmestep. Ths term represents the rate of whch C s projected on t self. Numercally t represents the probablty for a walker to ether be coped (cloned) or to be removed from the smulaton (klled). If p d ( ) < 0 the walker s cloned wth probablty p d ( ) and f p d ( ) > 0 the walker s klled wth the probablty p d ( ). The second term s p d ( j) = τh j, j, (8.12) whch s the rate of whch other C j s projected onto C. We say that p d ( j) represents the probablty that a walker on D j (the parent) spawns a walker on D (the chld). Note that the newly spawned walkers can have both a postve and a negatve sgn. From Eq. (8.8) we see that the chld must have the opposte sgn of ts parent f p d ( j) s negatve, and the sgn same f p d ( j) s postve. If two walkers wth the opposte sgn happen to populate the same determnant D, these walkers add zero to the ampltude C, and both walkers can therefore be removed from the smulaton. The sums over all elements τh j nvolve O(dm(H ) N w ) operatons snce the spawnng probabltes from all walkers to all determnants must be evaluated. Ths s a 45

46 numercally expensve task snce both dm(h ) and N w can be very bg, and for large smulatons we often have that dm(h ) and N W Accordng to Booth et. al. [3] t s numercally more effcent to sample the sum j j τh j C j j τh sj C j /p gen (s j), where p gen (s j) = 1, (8.13) s reducng the number of operatons to O(N w ) for each teraton. Even though the number of teratons before convergence wll ncrease, the cost of each teraton wll become so much cheaper that the overall numercal effcency s mproved. In Eq. (8.13) p gen (s j) s the suggeston probablty of s from j, and ths equaton s only correct when p gen (s j) s dfferent from zero for all connected determnants D j and D s p gen (s j) 0 D j Ĥ D s 0. (8.14) In prncple, there s no other constrant on p gen (s j), and as long as the number of teratons s large enough the smulatons wll eventually converge to the correct results, but n practce certan choces of suggeston probablty dstrbutons wll lead to a more effcent samplng. We wll return to ths topc n a later chapter. 8.2 Populaton control and the statstcal estmators As n DMC, we do not know the shft S = E 0 that gves the desred convergence to the ground state dstrbuton. The populaton of walkers wll ncrease f the shft S s larger than the ground state energy E 0 and decrease f S s smaller than E 0. We can therefore control the populaton by varyng S. We use the formula of Umrgar et. al. [39] whch has been used for populaton control n the context of DMC S S () = S ( 1) ξ τ log N () W N ( 1) W, (8.15) where ξ s a real number of whch the optmal value must be found by experment, and N () W and S () are the populaton and shft at the tme τ. Accordng to Eq. (6.5), the shft S that stablzes the number of walkers s the ground state energy, and ths means that the above equaton provdes us wth a measure of the ground state energy E 0. We use two dfferent statstcal estmators to calculate the energy. The frst s the one that we just mentoned, whch can be taken as the long tme average of the shft N E 0 S = (1/N) S (), (8.16) and whch we wll refer to as the generatonal estmator of the energy. The second s the projected estmator that s calculated by samplng the projected energy E 0 = D 0 Ĥ ψ 0 D 0 ψ 0 =1 = D 0 Ĥ C D D 0 C D. (8.17) 46

47 We rewrte the operator on the form E 0 E P = H 00 + H 0 n / n 0, (8.18) 0 whch means that we need to sample n 0 and n H 0 each teraton. These estmators are statstcal measures, and there wll always be a certan statstcal error assocated wth them. In a later chapters we wll dscuss how the statstcal errors can be calculated. It s also worth commentng that these two estmators, as we wll see, to a large degree are uncorrelated and therefore supplement each other n a valuable way. 8.3 The FCIQMC algorthm and smulaton procedures The frst step of a smulaton s always to ntalze a populaton of walkers. Before we start collectng statstcal samples, the populaton must be brought to the steady state dstrbuton of Eq. (8.4), whch can be done by followng a smple procedure whch we wll now descrbe. Frst, we start wth one walker on the Ferm determnant D 0, and a shft that s larger than E 0 to ncrease the spawnng probablty and obtan a rapd populaton growth. When N W has reached a predetermned value, we start varyng the shft S accordng to Eq. (8.16) to keep the populaton constant. After the populaton has reached the desred number of walkers, the smulaton stll have to run for a whle to assure that the populaton s properly thermalzed. The thermalzaton phase s assumed to be fnshed when the fluctuatons of the nstantaneous values of the projected energy and the shft are stablzed around some energy. For each teraton of the smulaton, the followng fve steps are performed: 1. (De/clone/spawn): Each walker lves permanently on the slater determnant where t was spawned. Cycle over all walkers and do: a. Attempt to kll or clone the walker. Frst p d ( ) = τ(h S) s calculated. If p d ( ) < 0 the walker s klled wth probablty p d ( ) and f p d ( ) 0 the walker s cloned wth probablty p d ( ). b. Attempt to spawn a new walker on a connected determnant D j wth the probablty τ H j /p gen (j ) where p gen (j ) s the generaton probablty of from j. The chld has the same sgn as ts parent f P s s postve and the opposte sgn otherwse. 2. (Annhlate): Annhlate pars of walkers of opposte sgn that lves on the same determnant. 3. (Update S): Two possble strateges: 1) In constant shft (S) mode, do nothng, 2) n constant populaton (N w ) mode, adjust S to keep N w constant. 4. (Sample statstcal estmators): Store samples to calculate the energy. We need to store the shfts S () to calculate the generatonal estmator, and n () 0 and n H () 0 to calculate the projected energy. 5. (Addtonal rule): The effcency of the algorthm can be mproved consderably f addtonal rules for the walker dynamcs are mplemented. The most used optmzaton of the algorthm, -FCIQMC, wll be dscuss n Chapter 9. 47

48 Ths algorthm s also llustrated n the flowchart n Fg Convergence crtera and the tme step error The projecton operator ˆP τ wll always project out the ground state regardless of the sze of the tme step τ, but ths s not the case when the projecton operator s approxmated as the frst order Taylor expanson ˆP τ = e (Ĥ S)τ 1 (Ĥ S)τ. (8.19) To analyze the convergence crtera we can equally well vew the Taylor expanson as our new projecton operator nstead of analyzng the truncaton error. The statonary states ψ are egenstates of the projecton operator as well as ts frst order Taylor expanson, and the egenvalue equatons are Recall that FCIQMC wll project out the state [1 (Ĥ S)τ] ψ = [1 (E S)τ] ψ. (8.20) [1 (Ĥ S)τ]n D 0 = [1 (Ĥ S)τ]n v ψ = [1 (E S)τ] n v ψ, S E 0, v 2 = 1, (8.21) whch wll converge to the ground state only n the case that the absolute values 1 (Ê S)τ are smaller than one for all > 0. Ths sets the upper bound for the tme step τ < τ m = 2/(E m S), E m = max ψ Ĥ ψ, (8.22) where E m s bounded n our truncated FCI spaces. Snce we do not know the value of the excted energes we can not compute τ m. But n general we can conclude that τ m decreases when E m ncreases, as for example when we ncrease the number of partcles, the number of shells or the nteracton strength. In our smulatons, we have used a tmestep τ = 10 3 or smaller, whch means that E m n practce can be larger than H, whch s greater than the most excted energes E m n any of the FCI spaces that we have used. Note that there s no tme step error, and as long as the tme step s smaller than τ m, the smulatons wll converge to the exact ground state. Ths s a consequence of the fact that the projected estmator and ts frst order Taylor expanson have the same egenstates. 8.5 The FCIQMC sgn problem Walker annhlaton s a necessary ngredent of the FCIQMC algorthm, and as we wll show, f we had turned off the annhlaton the smulatons would not converge to the ground state. Thus, when the number of walkers s small and the densty of walkers becomes too low for effcent annhlaton, the smulatons wll no longer yeld the correct results. Note that ths problem does not occur n DMC where most system can be sampled usng a small number of walkers. The reason s that the walkers do not carry a sgn as n FCIQMC, but nstead the sgn structure of the wave functon s forced by applyng the fxed node 48

49 Fgure 8.1: Flowchart that llustrates the basc FCIQMC algorthm. 49

50 approxmaton. To explan the mechansms behnd the FCIQMC sgn problem, we take a closer look at what would happen f we turned off annhlaton. Ths wll also provde an explanaton of why the smulatons do not converge to the ground state when the number of determnants s too low. To analyze ths stuaton we wll follow the dervaton of Spencer et. al. [36]. Assume that n + and n represents the postve and the negatve populaton of walkers on the determnant D. We defne the transton matrx T j = (H j δ j S) = T + j T j, (8.23) where T j + = max (T j, 0) and Tj = mn (T j, 0). From Eq. (8.8) we see that after one teraton, the dstrbutons wll change from n + to n + + n+ and from n to n + n where We rearrange these equatons on the form n + = τ [T jn + + j + Tjn j ], (8.24) j n = τ [T jn + j + Tjn + j ]. (8.25) j n + + n = τ [T j + + Tj ] [n+ j + n j ], (8.26) j n + n = τ [T j + Tj ] [n+ j n j ], (8.27) j whch tells us how the dstrbutons n + +n and n+ n wll evolve. Here n+ ±n represents the dstrbuton wthout annhlaton (+) and wth annhlaton ( ). The steady state solutons are the egenstates of the operators P + P j = 1 + τ j j = 1 + τ j [T + j + T j ], (8.28) [T + j T j ], (8.29) wth the correspondng egenvalues e + and e, and t can be shown that e + e (See Ref. [36] for the proof). Thus, f we turn off annhlaton, n + + n wll become exponentally larger than n + n. It s n prncple stll possble to fnd n + n by subtractng the negatve populaton n from the postve populaton n +, but n practce n + n wll be lost due to statstcal nose. Note that Pj s smply the frst order Taylor expanson of the projecton operator whch means that n + n s the ground state dstrbuton. We have now demonstrated that the sgn problem n FCIQMC stems from the emergence of the domnant egenstate of the operator P +, whle the desred state s the domnant egenstate of the operator P. In a smulaton, the growth rate of these egenstates wll be proportonal to the correspondng egenvalues e + and e. Accordng to Spencer et. al., f e + s much larger than e, a large concentraton of walkers s typcally necessary to acheve a hgh enough rate of annhlaton events. Such systems are sad to have a severe sgn problem, and are not easly smulated wth the FCIQMC algorthm. They also comment that t s dffcult to predct the severty of the sgn problem for a specfc system. A dffcult system s not necessarly a hghly correlated system, but the sgn problem rather depends on the structure and occurrence of negatve elements n the Hamltonan matrx. 50

51 Fgure 8.2: A typcal evoluton of the populaton sze n a smulaton where S s kept constant at a value that s slghtly greater than the ground state. In the frst phase of the smulaton the populaton s too small for effcent annhlaton, and the populaton grows exponentally wth a growth rate e +. When the populaton reaches a certan lmt, annhlaton events become more probable, and the state n + n emerges. The populaton growth s stll exponental, but wth a lower growth rate e, and ths can be observed as a plateau n the populaton sze. If the plateau occurs at a large populaton relatve to the dmenson of the Hlbert space, the sgn problem s severe. 51

52 Chapter 9 Intator-FCIQMC As we dscussed n the last chapter, the number of walkers must be above a certan threshold N C for the populaton to converge to the ground state dstrbuton. Ths threshold s, as we wll see later, system dependent and often close to the number of determnants n the FCI space. In practce ths s the man lmtaton of FCIQMC snce the numercal cost ncrease as a functon of the number of walkers. Cleland et. al. (2010) [4] demonstrated that by ntroducng a set of new dynamcal rules for the walkers, the performance of the algorthm can be mproved drastcally whle stll obtanng results wth a precson of 10 3 Hartree or better. The number of walkers that s necessary n such smulatons are typcally several orders of magntude smaller than N C. The mproved algorthm s called ntator-fciqmc or -FCIQMC, and s dentcal to FCIQMC except that we use dfferent rules to calculate the spawnng probablty. In essence, only walkers that fulfll certan crtera are allowed to spawn on determnants whch are prevously unpopulated, and such walkers are called ntators. Frst, a so called ntator space s defned. Ths s a set of determnants whch are assumed to have a relatvely large average populaton (large ampludes C ). All walkers that lve on a determnant n the ntator space become ntators, and ntators follows the same rules as walkers n the basc FCIQMC algorthm. The frst rule of -FCIQMC concerns the non-ntators and states: (Spawnng rule I): Non-ntators are only allowed to spawn on determnants that are prevously populated. Wth ths rule the Hlbert space s n effect reduced to the states n the ntator space and the sngle and double exctatons of these. Next, two addtonal rules are added to mprove the energy and to gve the walkers access to the entre FCI space: (Dynamc enlargement of the ntator space): All determnants D wth a populaton n > N I are automatcally ncluded n the ntator space for as long the populaton s above ths lmt. Here n s the sgned populaton of D and N s a postve nteger that we wll refer to as the ntator threshold. The ntator threshold s set before a smulaton. Note that wth N I = 1 -FCIQMC s dentcal to FCIQMC. (Spawnng rule II): If the absolute value of sum of all walkers that are spawned on an unpopulated determnant by non- ntators s larger than two, then the spawnng event s accepted. 52

53 Although the ntator adapton of the algorthm has been shown to ncrease the effcency, t s only correct accordng to Eq. (8.8) n the lmt of a large number of of walkers. In ths case the ntator space wll grow to nclude the entre FCI space, and -FCIQMC wll tend to FCIQMC. Ths means that the -FCIQMC energes are systematcally mprovable and that the so called ntator error wll decay to zero when the number of walkers s ncreased. 9.1 The ntator spaces The ntator spaces are defned usng a so called Complete Actve Space (CAS) crteron, whch s often used wthn quantum chemstry together wth methods lke many-body pertubaton theory or the confguraton nteracton method to pck out physcally relevant states of a Hlbert space [12]. The CAS s defned by specfyng two set of orbtals, the frozen and the actve orbtals. The CAS s the set of determnants where: (): The frozen orbtals are doubly occuped (by both a spn up and a spn down electron). (): The rest of the electrons are occupyng one of the actve orbtals. In our smulatons, the frozen orbtals are chosen to be the lowest lyng orbtals n the N F frst shells and the actve space conssts of all orbtals n the N A frst shells. Here N F and N A are preset parameters whch defnes the CAS. See Fg 9.1 for an llustraton. Because of the dynamc enlargement of the ntator space, the fnal result wll be the same regardles of whch CAS we choose. But, as Cleland et. al. [4] has demonstrated, an approprate CAS mght sometmes mprove the convergence. 53

54 Fgure 9.1: These fgures are schematc representatons of the three determnants D A, D B and D C of a four partcle system. Here R s the shell number and m s the magnetc quantum number. The vertcal lnes represents an orbtal that can be occuped by a spn up ( ) and a spn down ( ) electron. The CAS s defned by the frozen orbtals whch are the two spnorbtals wth the lowest energy, and the actve orbtals whch are all states n the second, thrd and fourth shell. The determnant D A s n the CAS snce both of the frozen spn orbtals are occuped. The determnant D B s not n the CAS space snce there are one electron whch s occupyng an orbtal whch s nether an actve orbtal or a frozen orbtal. The determnant D C s not n the CAS snce only one of the frozen orbtals are occuped. 54

55 Chapter 10 Samplng rules 10.1 The suggeston probablty dstrbuton Recall that earler n ths part we ntroduced the suggeston probablty p gen ( j) whch s the probablty of samplng (tryng to spawn a walker on) the determnant D from the determnant D j. Before we can mplement the algorthm t s necessary to fnd a smart suggeston probablty, and there are two man factors that should be consdered. (): From a samplng perspectve, there exst certan suggeston probabltes that are more favourable than others. For example, usng a p gen where the most physcally relevant states have a larger probablty of beng sampled s expected to lead to a faster convergence n terms of the number of teratons. Ths method s commonly referred to as mportance samplng [10]. (): We must fnd a p gen ( j) that can be sampled wth a low numercal cost. As an example of the opposte, assume that we want to use a p gen ( j) where all connected determnants D ( D D j 0) have an unform probablty. Ths mght be effcent from a samplng perspectve but not from a numercal perspectve. The reason s that we would have to explctly fnd all connected determnants whch s a numercally expensve operaton. We have used a strategy that was ntroduced by Booth et. al. (2009) [3], where the suggeston probablty dstrbuton has the form p gen ( j) = (1 p s ) B j (pqkl), for D = ( ) p a pa qa l a k D j, {k, l} {p, q} = 0, p s A j (pk), for D = ( ) p a pa k D, p q, 0, otherwse. (10.1) Here p s s a predetermned probablty whch represents the probablty of a sngle exctaton versus a double exctaton, and A j (pk) and B j (pqkl) are probablty dstrbutons whch we wll look closer at n the next secton. Only sngle and double exctatons are ncluded here snce all pars of unconnected determnants corresponds to a spawnng probablty p d ( j) H j = 0. The samplng wll be correct regardless of the choce of p s, but optmally 55

56 t should reflect the true probablty of a sngle exctaton. Snce the sngle exctatons are more probable than the double exctatons, t s natural to choose a p s that s close to one, but a bad choce of p s would, at least n prncple, only lead to a less effcent samplng and a slower convergence. The dstrbutons A j (pk) and B j (pqkl) are chosen such that they are numercally cheap to sample. But the resultng suggeston probablty dstrbuton p gen ( j) s not based on any samplng consderatons and wll probably not gve the fastest convergence n terms of the number of teratons Samplng of the suggeston probabltes We wll now dscuss the dstrbutons A j (pk) and B j (pqkl) and show how they can be sampled n an effcent manner. In the case of a sngle exctaton, one occuped orbtal ϕ k s chosen wth the unform probablty A (k) = 1/N P, and a new orbtal ϕ p s then pcked randomly from the set S 0 = {ϕ p ; a pa k D H, p k}. (10.2) The condtonal probablty of drawng p s A j (p k) = 1/sze(S 0 ) and the total probablty becomes A j (pk) = A j (p k)a (k) = [N P sze(s 0 )] 1. (10.3) In the case of a double exctaton, two random occuped orbtals ϕ k, ϕ l are chosen wth the unform probablty B j (kl) = ( N 1 P 2 ) 2 = [N P (N P 1)] (10.4) We now need to consder f there are any constants on whch q we can pck. We must fnd the subset S 1 of orbtals that fulflls these constrants and pck a random orbtal ϕ q S 1 wth the probablty B j (q kl) = 1/sze(S 1 ). (10.5) The next step s to follow the same procedure for p. We fnd the subset S 2 of orbtals that fulflls our constrants gven that ϕ q s already pcked, and pck a random ϕ p S 2 wth the probablty B j (p qkl) = 1/sze(S 2 ). (10.6) Snce the probablty of drawng the par (p, q) n general s dependent of whch order they are pcked, we must explctly calculate the probablty of drawng p, q n the reverse order. We follow the same procedure as above and fnd The total probablty becomes B j (p kl) = 1/sze(S 3 ), (10.7) B j (q pkl) = 1/sze(S 4 ). (10.8) B j (pqkl) = B j (kl)(b j (q pkl)b j (p kl) + B j (p qkl)b j (q kl)). (10.9) 56

57 10.3 Fndng the sets S n As many crtera as possble should be set to reduce the number of elements n the sets S n. We must only make sure that p gen ( j) 0 when the spawnng probablty p d ( j) 0, otherwse the samplng wll be erroneous. If p gen 0 and p d ( j) = 0 the spawnng step wll be rejected, and the samplng wll be less than optmal but wll stll gve the correct result. For two dmensonal quantum dots, the followng quanttes are conserved n an nteracton m p = m k, s p = s k (sngle exctaton), (10.10) m p + m q = m k + m l, s p + s q = s k + s l (double exctatons), (10.11) where m and s s the magnetc quantum number and the spn of the th orbtal. We wll use these conservaton rules and outlne a smple procedure to fnd the sets S n. In the case of a sngle exctaton, assume that we have annhlated an orbtal ϕ k, then S 0 s calculated S + 0 = {ϕ, m = m k, s = s k }, (10.12) S 0 = S + 0 /T, (10.13) where T s the set of all occuped orbtals n D j and S 0 + /T s the set S+ 0 where all elements whch are n T are removed. In the case of a double exctaton, assume that we have drawn two occuped orbtals ϕ k, ϕ l wth opposte spns. Then we have the restrctons that If s k s l m k + m l m q R, (10.14) where R s the absolute value of the largest m n our bass. The smallest and the largest allowed magnetc quantum number m MIN, m MAX can be found If s k s l, m MIN = max( R, m k + m l R), (10.15) m MAX = mn(r, m k + m l + R). (10.16) In the case where the two orbtals k, l have the same spns, t s possble to mprove Eq. (10.14) slghtly whch gves The set S 1 s now defned as If s k = s l f s k = s l, m k + m l m p < R, (10.17) m MIN = max( R + 1, m k + m l R), (10.18) m MAX = mn(r 1, m k + m l + R). (10.19) S + 1 = {ϕ ; m [m MIN, m MAX ], s {s k, s l }}, (10.20) S 1 = S + 1 /T, (10.21) 57

58 and the set S 2 s defned as S + 2 = {ϕ, m = m k + m l m q, s = s k + s l s q }, (10.22) S 2 = S + 2 /{T, ϕ q }. (10.23) If ether S 1 = 0 or S 2 = 0, the spawnng event s rejected. The sets S 3 and S 4 are calculated n the same way by nterchangng p and q n the equatons above. Excted state: nm Suggeston probablty/(1 p s ) = B 0 (nm01) ϕ 2 ϕ 5 1/18 ϕ 3 ϕ 4 1/18 ϕ 6 ϕ 11 2/18 ϕ 7 ϕ 10 2/18 ϕ 8 ϕ 9 2/18 ϕ 12 ϕ 19 2/18 ϕ 13 ϕ 18 2/18 ϕ 14 ϕ 17 1/18 ϕ 15 ϕ 16 1/18 ϕ 2 ϕ 17 1/18 ϕ 3 ϕ 16 1/18 ϕ 4 ϕ 15 1/18 ϕ 5 ϕ 14 1/18 Table 10.1: Example of the suggeston probabltes of double exctatons from the reference determnant 01 whch s descrbed by the probablty dstrbuton B 0 (nm01) (see: Eq. (10.1)). The system s a two partcle quantum dot wth four shells n the bass, and the spn orbtals ϕ are enumerated as n Fg Only the states where the total spn and magnetc quantum number are conserved s ncluded. The probabltes B 0 (nm01) are calculated usng the rules of the last secton. Note that the suggeston probablty of a state has no connecton to how physcally mportant t s. 58

59 Chapter 11 Storng and accessng the Coulomb matrx elements The Coulomb matrx elements (CME) v jkl ϕ ϕ j ˆV ϕ k ϕ l AS, (11.1) must be avalable every tme the spawnng probablty s calculated. It s therefore crtcal for the speed of the code to be able to access them quckly. Calculaton of the CME on the fly s very neffcent. Storage of the CME n an array s much faster but t s not straght forward to fnd an effcent way of addressng and accessng them. One way s to store all nonzero matrx elements n one array and the correspondng occupatons jkl n a second array. The address of the matrx element can then be obtaned by fndng the ndex of the correspondng occupatons numbers n the occupatons array. Ths method has the advantage that only the nonzero elements are stored and the drawback that all occupatons must be stored as well and that a numercally expensve search must be performed on the occupatons array. In ths secton we wll show how t s possble to map each of the nonzero matrx elements to a unque address. The advantages are that the expensve search and the storage of the occupatons array s omtted Indexng scheme In ths secton we wll refer to the states ϕ ϕ j as the spnless wave functons, meanng that any of the coulomb matrx elements can have 16 dfferent spn confguratons. Only 6 of these are permtted, namely those where the total spn s conserved. In addton, the matrx elements are unchanged f the spn projecton of the states s polarzed, reducng the number of unque elements to 3. We wll enumerate the three allowed pars of spn confguratons n the followng way n s (jkl) = 0 f (s, s j, s k, s l ) {(,,, ), (,,, )}, (11.2) n s (jkl) = 1 f (s, s j, s k, s l ) {(,,, ), (,,, )}, (11.3) n s (jkl) = 2 f (s, s j, s k, s l ) {(,,, ), (,,, )}. (11.4) 59

60 When we want to specfy the spn confguraton of the orbtals we wll denote the matrx element From the defnton t s clear that v ns(jkl) jkl. (11.5) v jkl = dxdyψ (x)ψ j (y) [ψ l(y)ψ k (x) ψ k (y)ψ l (x)], (11.6) v jkl = v jlk = v klj = v lkj = v jkl = v jlk = v klj = v lkj. (11.7) From the above symmetres we see that most matrx elements can be represented n multple ways. For numercal reasons we want to construct a set where each element s represented only once. Frst we defne the mappng f(jkl) whch s a reorderng of the ndces jkl where only three operatons are allowed to permute the ndces: Changng the postons of the two frst ndces wth the two last. Interchangng the two frst ndces. Interchangng the two last ndces. The reordered ndces must fulfll one of the three equatons below: It can be shown that Case I: Case II: Case III: j k l. < j > k l where < k. < j > k < l where = k, j l. (11.8) f(jkl) = f(jlk) = f(klj) = f(lkj) = f(jkl) = f(jlk) = f(klj) = f(lkj), (11.9) meanng that each unque element gven the symmetres of Eq. (11.7) s represented exactly once n the set of v jkl F where F s the set of all possble combnatons f(jkl). We defne p(jkl) to be the number of permutatons that are necessary to arrange the ndces n the desred order. Any matrx element can now be wrtten v jkl = ( 1) p(jkl) v ns(f(jkl)) f(jkl), (11.10) where we have ncluded the spn confguraton n the superscrpt. 60

61 11.2 Mappng of the ndces to a ponter It s possble make a mappng of the ndces (jkl) to memory addresses n such a way that none of the matrx elements of Eq. (11.10) are represented more than once, and we wll now explan how ths s done. We store all elements wth an address that s stored as (n C++ notaton) (*(P[n_S] + n_g)+n_i), (11.11) where P s an array of double ponters 1 and n_s, n_g and n_i are non-negatve ntegers. Frst n_s s calculated accordng to Eq. (11.2)-(11.4) n_s n s (jkl). (11.12) Next we fnd ( j k l ) f(jkl) accordng to Eq. (11.8)). The ndces j, k, l (case I,II of Eq. (11.8)) or j, l (case III of Eq. (11.8) ) are mapped to the nteger n_g, where the set of all possble n_g conssts of all ntegers n an nterval that start at 0. We wll show how to fnd N_G later n ths chapter. Next we fnd n I whch s the number of orbtals ϕ m wth m < and a magnetc quantum number m = m k + m l m j. (11.13) Note that n_i can be all ntegers between 0 and j n case I, between 0 and k n case II and between 0 and l n case III. Snce both N_G and N_I always occupy all values n an nterval begnnng at 0, all addresses of (11.11) corresponds to exactly one matrx element v ns(f(jkl)) whch means f(jkl) that ths s a space effcent way of storng the CME Graphcal representaton of model spaces The method of mappng the ndces j, k, l or j, l to n_g s nspred by a method that s outlned by Ref. [12], where t s demonstrated how one can map general determnants to an ndex. A short ntroducton to ths method follows. Assume a three partcle state n 0 n 1 n 2 where n 0 < n 1 < n 2 and n [0, 5]. Frst we note that all possble determnants can be represented by one path n Fg All paths that start n the upper left crcle and end n the lower left crcle, and follow the drecton of the arrows through the crcles, represents a determnant. The occupaton number n s equal to the value of n at the crcle where one moves from the row marked (n ) to the next row, and the ndex can be found by summng the arrow weghts (the number above the arrows) along the path. In ths case t s easy to check that all paths corresponds to a unque ndex and that n 0 < n 1 < n 2. As an example, the determnant 124 s represented by the path marked up by the double headed arrows n Fg The ndex of the path s found by summng the arrow weghts and as we see from the fgure the ndex s = 6. We need two rules to construct the tables Fg Frst note that the crcled numbers represents the total number of paths leadng from the upper left crcle. To fnd ths number 1 In C++ ths array s declared as double*** P. 61

62 Case I: n n 0 n 1 n Fgure 11.1: Each path that can be made by followng the arrows from the upper left corner to the lower rght corner represents a three partcle state n 0 n 1 n 2 where n 0 < n 1 < n 2 and n [0, 5]. The occupaton number n s equal to the value of n at the crcle where one moves from the row marked (n ) to the next row. Each such path can be mapped to an ndex, and these ndces occupy all values n an nterval that starts on zero. The ndex can be found by summng the numbers above the arrows along the path, and all paths corresponds to a unque ndex. For example, 124 s represented by the path marked up by the double headed arrows, and the ndex s = 6. one smply adds the numbers crcles from whch one can move to the gven crcle. The numbers above the arrows s the dfference between the crcled number at the start and at the end of the arrow, and appears when one moves from one row to the next n the table. In smulatons, we would ntalze the matrx of the arrow weghts M GRMS = (11.14) The ndex of a determnant (path) n 0 n 1 n 2 can then be calculated M GRMS (n 0, 0) + M GRMS (n 1 1, 1) + M GRMS (n 2 2, 2). (11.15) 11.4 The arrow weght matrces In our case we need to make tables and arrow weght matrces for the three cases of Eq. (11.8) Case I: All paths j k l, j are represented n the Fg The arrow weghts are marked as M,j I. By followng the rules ntroduced n the last secton, t s easy 62

63 to verfy that M I 0, = 0, M I 0, = M I 0, 1 + 1, (11.16) M I,0 = M I 1,0 + M I,j 1 for 0, j 0, (11.17) N I = M I N 0 1,0 + M I N 0 1,1 + M I N 0 1,2, (11.18) where N I s the number of possble paths. The ndex N_G can be calculated by summng the matrx elements N_G = M I j,0 + M I k,1 + M I l,2. (11.19) Case I: n j k l M I 0,0 M I 0,1 M I 0, M I 1,0 M I 1,1 M I 1,2 M I 2,0 M I 2,1 M I 2,2 N o 1 M I N 0 1,1 M I No 1,1 M I No 1,2 N I Fgure 11.2: All paths j k l, j are represented here. See the explanaton n the text. Case II: Fgure 11.3 contans all paths j > k l, < k. The frst row (n = 0) s empty snce < k. In ths case the arrow weght matrx can be constructed by usng the followng rules: The ndex N_G s calculated M0, II = M1, II = 0, (11.20) M,o II = N o 1, for > 1, (11.21) M2,1 II = M2,2 II = N o 1, (11.22) M,1 II = N o 1 for [3, N o 2], (11.23) M II,2 = M II 1,2 + M II,1 for [3, N o 2], (11.24) M II N o 1,2 = M II N o 2,2 + M II N o 2, (11.25) N_G = M II j,0 + M II k,1 + M II l,2 + N I, (11.26) N I s added to start the ndexng at the last ndex f case I. Total number of paths s N II = M II 0,N M II 1,N M II 2,N o (11.27) 63

64 Case II: n j k l M II 20 M II 11 M II 12 M II 30 M II 21 M II 22 M II 31 M II 32 3 N o 3 N 0 2 N o 1 M II No 3,0 M II No 2,0 M II No 3,1 M II No 3,2 M II No 2,1 M II No 2,2 MNo 1,0 II N I MNo 1,2 II 1 N II Fgure 11.3: All paths j > k l, < k are represented here. See the explanaton n the text. Case III: Fgure 11.4 contans all paths j l, < j. The frst row (n = 0) s empty snce = k < j. The arrow weght matrx can be constructed n the same way as M I,j. The ndex N_G can be calculated usng the tho frst columns of M I and subtractng 1 from j and l snce both, j must be larger than zero (see the table below) N_G = M I j 1,0 + M I l 1,1 + N I + N II, (11.28) where N I and N II s added to start the ndexng at the last ndex f case II Memory requrements and numercal speed The memory used to store the CME ncreases rapdly as a functon of the bass sze, but he memory requrements are stll acceptable. For example, for a bass wth 28 shells, there are CME and ponters whch requres about 1.2GB of memory. The array weght matrces are very small, and wth 28 shells the largest of these have the dmensons In our mplementaton of the algorthm, a total of 6 f-tests and a lttle less than 4 array 64

65 Case III: n j l N o 1 M1,0 III M1,1 III 1 N III M III 2,0 M III 2,1 M III 3,0 M III 3,1 M III No 1,0 M III No 1,1 N III Fgure 11.4: All paths j l, < j are represented here. See the explanaton n the text. lookup on average (n the small weght matrces) are necessary to retreve the address of a CME. Ths s the same for all systems regardless of the sze of the bass Generalsaton to other systems It should be straght forward to generalze ths method to other systems.the only requrement s that the matrx elements have the same symmertes (see: Eq. (11.7)) wth regard to the spatal part of the sngle partcle wave functons. Other quantum numbers should n general be possble to account for by redefnng the array n I. 65

66 Part IV Data analyss 66

67 A large amount of data s collected durng an FCIQMC smulaton. The calculated observables are based on statstcal measures, lke the generatonal estmator and the projected estmator whch we ntroduced n the last chapter. These are only nterestng f we can fnd a relable measure of the precson of the results. In the frst chapter, we ntroduce a numercally effcent way of estmatng the statstcal error. Ths method was dscussed n an artcle by Flyvbjerg and Pedersen (1989) [5], and we wll smply refer to t as Flyvbjerg- Pedersen analyss or blockng analyss. We wll frst dscuss ths method n a general context, and then how n can be used to calculate the statstcal error of the generatonal and the projected estmator. We wll then dscuss curve fttng. In secton 5.3, we ntroduced a formula that can be used to extrapolate the energes of fnte Hlbert spaces to the lmt of an nfnte bass. But to do so, we need to ft a parametrzed functon to a set of dataponts. Least square fttng s an effcent method of curve fttng, and we wll ntroduce ths method n the second chapter n ths part. 67

68 Chapter 12 Statstcal analyss A large amount of data s generated durng a smulaton, and observables are calculated on the bass of mean values of dfferent data sets. These estmates are only nterestng f we can fnd a relable measure of the precson of the results. The standard way of dong ths s to calculate the statstcal error ɛ, whch s defned as the standard devaton of the sample mean for a gven number of samples. In ths secton we wll show how the statstcal error can be estmated usng Flyvbjerg-Petersen analyss[5], and also dscuss complcatons that arse when we calculate the statstcal error of the projected energy The statstcal error Assume that we have drawn a number of samples {x, x 2,..., x n } from the statstcal dstrbuton p(x), and that we want to estmate the statstcal error ɛ of the mean f = 1 n n =1 f, f f(x ). (12.1) We further assume that the samplng s ergodc, such that lm n f s equal to the ensamble average f = dx p(x)f(x). (12.2) The mean f s a fluctuatng quantty for real data sets where n <, and the statstcal error ɛ s defned as the standard devaton of f. We frst calculate the varance ɛ 2 = f 2 f 2. (12.3) By nsertng Eq. (12.1) nto Eq. (12.3) t s straght forward to show that ɛ 2 = [ 1 n n =1 f ] 2 1 n n =1 f 2 = 1 n 2 Further manpulaton of ths expresson gve ɛ 2 = 1 n n n 2 γ + 2 =1 n =1 j=+1 n γ j, γ j = f f j f f j. (12.4),j=1 γ j = 1 n 1 n 2 [nγ t=1 (n t)γ t ], (12.5) 68

69 here γ t s the correlaton functon γ t = γ j, t = j, (12.6) where the correlaton only depends on the tme t between the samples. Now we can wrte ɛ 2 = 1 n 1 n [γ t=1 (1 t/n)γ t ]. (12.7) For uncorrelated data sets γ t = 0 for t 0 and the formula becomes very smple, and the expresson for the statstcal error can be reduced to the smple form ɛ = γ 0 n, (12.8) where γ 0 s the standard devaton of f whch can be estmated by takng the standard devaton of the data set {f 0, f 1,..., f n } Flyvbjerg-Petersen analyss The samples from concurrent teratons n an FCIQMC smulaton are heavly correlated. Ths s due to the fact that the dstrbuton of walkers fluctuates slowly around an equlbrum dstrbuton, and therefore we can not use the smple expresson of Eq. (12.8). We wll now ntroduce the Flyvbjerg-Petersen analyss[5], whch s a method where the statstcal error can be found wthout explctly calculatng the correlaton functon γ t and at a low computatonal cost. Assume that the data set {f 1, f 2, f 3,..., f n } s transformed nto a new data set that s half the sze {f (2) 1, f (2) 2, f (2) 3,..., f (2) n/2 } where f (2) = 1 2 (f f 2 ). (12.9) The key pont s that the mean and the statstcal error are nvarant under ths transformaton such that f = f (2), ɛ = ɛ (2). (12.10) When we do ths transformaton repeatedly, we get the data sets {f (2m ) 1, f (2m ) 2, f (2m ) 3, f (2m ) n/2 m }, m = 1, 2, 3, 4,.... (12.11) and t s proved by nducton that all of these data sets have the same mean and statstcal error. As we ncrease m, the correlaton between the samples f (2m ) and f (2m ) +1 wll become smaller. And therefore, as we see from Eq. (12.7), ɛ (2m) = σ (2m) / n/2 m ɛ as m becomes large. (12.12) We now see that we can fnd the error ɛ by calculatng a seres of values ɛ (2m) wth ncreasng m untl t converges to ɛ. The estmated error wll then ncrease untl the pont where t remans fxed wthn statstcal fluctuatons. Ths procedure can be performed by wrtng 69

70 a small computer program that repeatedly transforms the ntal data set accordng to Eq. (12.10) and calculates ɛ (2m). Note that the statstcal error of ɛ 2m can be estmated usng the formula See [5] for a justfcaton of ths formula. σ 2m ɛ 2m n/2 m 1 ± 1 2(n/2 m )). (12.13) 12.3 Flyvbjerg-Pedersen analyss for FCIQMC To calculate the statstcal error of the shft S, we store ts value each teraton, and use blockng analyss to calculate the error after the smulaton has fnshed. Snce we only store one sample each teraton, the amount of data that needs to be stored s small. It s somewhat more complcated to calculate the statstcal error of the projected energy. Ths s due to the fact that the projected energy s a functon of two stochastc varables. Recall that the projected energy can be wrtten E p = j C H 0j n j, (12.14) n 0 where C s the set of the ndces of all determnants that are connected to the reference determnant D 0 C = {j ; D0 Ĥ Dj 0}. (12.15) Each teraton, we save the two quanttes E j C H 0j n () j, and G n () 0. (12.16) The projected energy can be wrtten as a functon of these quanttes E p = N =1 E /N N =1 G /N = N =1 E N =1 G. (12.17) However, we can not use the blockng algorthm on E P because there s no smple way to splt up the fracton of the sums nto blocks. We have nstead sorted the ndvdual stochastc varables n blocks E (n) and G (n) E (n) G (n) 1 n 1 n such that the projected energy can be wrtten E P M =1 E(n) = 1 M =1 G(n) M n E, (12.18) k=n( 1)+1 n G, (12.19) k=n( 1)+1 wth [1, N/n], (12.20) M E (n) =1 G (n) + ξ n, M = N/n. (12.21) 70

71 and the sample varance Var M =1 E(n) M =1 G(n) = Var 1 M = Var 1 M M E (n) =1 M =1 G (n) E (n) G (n) + ξ n + Var [ξ n ] + 2Cov 1 M M =1 E (n) G (n), ξ n. (12.22) In the case that the statstcal fluctuatons of ξ n are much smaller than the statstcal fluctuatons of the sample mean, as we see from the above equaton, the statstcal error can be estmated by dong blockng analyss on the dataset {E (n) /G (n) } N/n =1 threatng E(n) /G (n) as a sngle stochastc varable. We wll now fnd an expresson for ξ n, and show that t n fact can be neglected. The error ξ n can be found by consderng the Taylor expanson of the projected estmator. We let µ E be the sample mean of {E } and µ G be the sample mean of {G }. The multvarate Taylor expanson of the fracton E (n) /G (n) around the pont (µ E, µ G ) s T E (n) G (n) E = T 0 (n) G (n) The zeroth, frst and second order terms are E T 0 (n) G (n) E T 1 (n) G (n) E T 2 (n) G (n) E + T 1 (n) G (n) E + T 2 (n) G (n) +... (12.23) = µ E µ G, (12.24) = 1 (E (n) µ G µ E ) µ E µ 2 (G (n) µ G ), (12.25) G = µ E µ 3 (G (n) µ G ) 2 1 G 2µ 2 (G (n) µ G )(E (n) µ E ). (12.26) G We want to fnd the Taylor expanson of the sum (1/M) M =1 (E(n) ). Ths s a mul- 1,..., G (n) M ), but snce, G (n), t can be wrtten as the sum tvarate Taylor expanson T n the varables (E (n) 1,..., E (n) M, G(n) each term of the sum only depends on one par E (n) The zeroth order term s T 1 M M =1 1 M E (n) G (n) M =1 = 1 M E T 0 (n) G (n) M =1 T E (n) G (n) /G (n). (12.27) = µ E µ G, (12.28) whch s exactly the projected energy. The frst order contrbuton dsappears snce 1 M M =1 E T 1 (n) G (n) = 1 µ G ( 1 M M =1 E (n) µ E ) µ E µ 2 G ( 1 M 71 M =1 G (n) µ G ) = 0. (12.29)

72 Ths means that the leadng term n the error ξ n s the second order term 1 M M =1 E T 2 (n) G (n) = µ E µ 3 G 1 M M (G (n) =1 µ G ) 2 1 2µ 2 G 1 M M (G (n) =1 µ G )(E (n) µ E ) = µ E µ 3 Var(G (n) ) 1 G 2µ 2 Cov(G (n), E (n) ), (12.30) G where Var(G (n) ) s the varance of the data set {G (n) 1,..., G (n) M }N =1 and Cov(E(n), G (n) ) s the covarance of the data sets {G (n) 1,..., G (n) M }N =1 and {E(n) 1,..., E (n) M }N =1. We have now shown that ξ n has a leadng term ξ n µ E µ 3 Var(G (n) ) 1 G 2µ 2 Cov(G (n), E (n) ), (12.31) G whch, for the systems we have treated, s a very small number. Note that typcal values for the means are µ G and µ E /µ G The fluctuatons of ξ n are expected to be much smaller than the fluctuatons of the projected estmator and can therefore be neglected when we calculate the statstcal error. As we can see from Fg. 12.2, ξ n falls off as a functon of n whch means that t s favourable to use a large n. We therefore leave the block sze at one, and ncrease n nstead. Ths s analogous to ncreasng the block sze when ξ n s neglgble, but n the case of a large ξ n we would see a convergence of the statstcal error n terms of ths varable as well. Fgure 12.1: Blockng analyss of the of the projected estmator E P and the generatonal estmator S. We see convergence at 1 ml-hartree for both estmators. The blocksze n s doubled between each data pont. 72

73 Fgure 12.2: The error ξ n as a functon of n, where n s the blocksze. Both of these are defned n Eq. (12.21). The error s n unts of Hartree, and s very small compared to the projected energy whch s close to 20 Hartree. 73

74 Chapter 13 Curve fttng We wll here ntroduce the least square method whch s a method of fttng parametrzed functons to data ponts. Ths secton s to a large degree based on the note on least square methods by Rchter (1995) [28] and the artcle on the Levenberg-Marquardt algorthm by Marquardt (1963) [17] The least square method Assume that we have a set of ponts {(x, y )} n =1, and that we want to ft to the functon y(x, a), a = (a 1, a 2,..., a m ) to these ponts wth respect to the parameters a. A common optmzaton method s the least squares method, where the optmal values for a are found by mnmzng the functon M(a) = n =1 [y(x, a) y ] 2, (13.1) where σ s the standard devaton of the value y. The mnma s found at the pont a 0 where σ 2 [ δm(a) δa ] a 0 = 0 [1, m]. (13.2) We wll frst look at the lnear least square problem where y(x, a) can be wrtten as a lnear functon. Ths s the startng pont for nonlnear least square problems as well snce all functons can be approxmated as a frst degree Taylor polynomal close to a mnma. The lnear y(x, a) s wrtten y(x j, a) = m =1 whch can be rewrtten as the matrx equaton Eq. (13.1) can also be wrtten as a matrx equaton a f (x j ) = a F j, (13.3) y = Fa. (13.4) M(a) = (b Xa) T (b Xa), (13.5) 74

75 where b s a vector wth the components b = y /σ and X s a matrx X j = f j (x )/σ. The mnma s found when Eq. (13.2) s fulflled. Dervaton wth respect to a now yelds 0 = 2X T (b Xa 0 ) X T Xa 0 = X T b a 0 = (X T X) 1 Xb = CX T b. (13.6) Ths problem can be solved effcently on a computer usng standard methods. See for example [8] for a detaled descrpton. The nonlnear problem s typcally solved by usng an teratve algorthm that searches for the mnma of M(a) n the parameter space of a. One class of such methods are the gradent methods, where one repeatedly steps n the drecton of the negatve gradent a a k a M(a), untl the mnma s found. Another class of teratve methods are the Gauss-Newton methods, where the strategy s to expand the nonlnear functon as a frst order Taylor polynomal and solve t as a lnear least square problem. The procedure s repeated untl convergence. The most popular least squares algorthms, lke the Levenberg- Marquardt method [17], combnes the ndvdual strengths of these algorthms. Several hghly optmzed mplementatons of the Levenberg-Marquardt algorthm are avalable as for example the MnPack lbrary, or ts Python wrapper curve_ft n the ScPy lbrary Error analyses of the least square ft In ths secton we wll only look at the nonlnear problem. More specfcally, we want to calculate the error of the extrapolated functon y(x, a) at a pont x based on the standard devatons σ of the data. When we get suffcently close to the mnma a 0, the y(x, a) and M(a) can be wrtten as frst order Taylor expansons y(x, a) = y(x, a 0 ) + (a a 0 ) T d(x, a 0 ), (13.7) M(a) = M(a 0 ) + (a a 0 ) T H(a 0 )(a a 0 ), (13.8) where H(a) has elements H j and d(x, a) has elements d d = If we now substtute f(x) d(x, a 0 ) n Eq. (13.3) we get δy(x, a), H j = δ2 M(a). (13.9) δa δa δa j X j = 1 σ δy(x ) δa j. (13.10) Now t s easy to see that we end up wth the same expresson for the mnma as n the lnear case An nfntesmal change n a 0 s wrtten a = CX T b. (13.11) a = CX T b. (13.12) 75

76 The covarance matrx σ 2 a wth elements σ aj s σ 2 a = a a = C(X T b)(x T b) T C = CX T b b T XC T. (13.13) The matrx b b T has the elements ( y /σ )( y j /σ j ). Here {y } are assumed to be statstcally ndependent varables, whch means that and ( y /σ )( y j /σ j ) = δ j (13.14) b b T = 1, (13.15) σ 2 a = CX T XC T = CC 1 C T = C. (13.16) Ths result allows us to calculate the covarance matrx σ 2 y(x, x, a) = y(x, a) y(x, a) whch n the specal case that x = x s the varance of y(x, a). σ 2 y(x, x, a) = y(x, a) y(x, a) m m = =1 j=1 m m = =1 j=1 δy(x, a) δy(x, a) a a j δa δa j δy(x, a) a a j δy(x, a) δa δa j = d(x, a) T Cd(x, a), (13.17) whch means that the standard devaton of y(x, a) can be wrtten 13.3 Error analyss of E(R) σ 2 y(x, x, a) = d(x, a) T Cd(x, a). (13.18) We wll use Eq from the last secton to fnd the estmated error of the parametrzed energy E(R) defned n Eq. (5.11) at R. The error vector d wth the elements d s d = [ δe δa ] a=a 0. (13.19) For the parametrzed functon for the error E(R) we fnd the elements d : [ δe(r) = 1, (13.20) δa ]a=a 0 [ δe(r) δb [ δe(r) δc R = b 0 (N + r)r ]a=a c 0, (13.21) 0 r=1 R = b 0 log(c 0 )(N + r)r ]a=a c 0. (13.22) 0 r=1 <++> The error s always convergng at a large R. We already know that R r=1 r (c 0 1) converges for c > 2. The sum R r=1 log(r)r c 0 wll also converge snce t s bounded by R r=1 r (c0 1). 76

77 Part V Implementaton of the FCIQMC algorthm 77

78 In ths part we am to gve an overvew of our mplementaton of FCIQMC. The code s wrtten n C++, and s talored to calculate energes of the statonary states of 2-dmensonal open shell quantum dots. It s organzed n separate nterchangeable modules, meanng that t relatvely easly can be modfed to tackle other systems such as 3-dmensonal quantum dots, molecules or atomc nucle, and s fully parallelzed usng a hybrd approach whch combnes multthreadng and parallelzaton wth MPI. The man deas behnd the algorthm s more or less explaned n part III, but there are several aspects of the mplementaton that we want to dscuss further such as the numercal representaton of the determnants and the state vectors, the code structure, parallelzaton, optmzatons etc. In the frst three chapters we gve an overvew of our mplementaton. Frst we wrte about the numercal representaton of the state vector followed by a descrpton of how we have mplemented the basc (sequental) algorthm and how we have mplemented the parallel algorthm. After readng these chapters, one should have a general understandng of how our mplementaton works. In the fourth chapter we look closer at the code. We present the class structure and provde detals about the dfferent classes. Ths chapter should gve a good understandng of how the code s structured and what responsbltes of the dfferent classes are. In the ffth chapter we have presented some benchmarks and dscuss the effcency of the program. We have also looked at dfferent run tme parameters, and how they should be tuned to optmze the performance of the code. The sxth and last chapter contans a short dscusson of how the code should be modfed n we want to smulate dfferent systems than the two dmensonal quantum dots. 78

79 Chapter 14 Numercal representaton of the determnants, the bass and the state vector 14.1 The determnants A large part of the of the code s devoted to manpulatng, sortng, readng from and transferrng nformaton about determnants and ther populatons. It s therefore crucal for the effcency of the code that we fnd a smart way of representng them numercally, whch s both space effcent and computatonally fast. We have chosen to store the determnants as bnary strngs, b ϕ0, b ϕ1,..., b ϕ2m, wth the same number of elements as the number of spn orbtals 2M that are accessble n the bass. The bt b ϕ here represents the th orbtal, and s set to 1 f the orbtal s occuped and 0 otherwse. As an example, the state D = ϕ 0, ϕ 1, ϕ 3, ϕ 5, ϕ {ϕ 0, ϕ 1, ϕ 2,..., ϕ 2M } (14.1) would be represented as the bnary strng D b ϕ0 b ϕ1 b ϕ2 b ϕ3... b ϕ2m = (14.2) Note that n the drect product bass, the number of possble combnatons of b ϕ0,..., b ϕ2m s exactly the number of determnants n the bass, whch means that ths s the most space effcent way of storng the determnants. We have chosen to use the C++ contaner class sdt::btset to store the bnary strngs. Ths class s optmzed to be space effcent, and each element occupes only one bt of memory. Note that bool, whch probably s the most used type for storng bts, usually occupes 8 bts memory per bt of nformaton. Low memory usage s an mportant consderaton when optmzng the code for parallel computatons snce we want to mnmze the communcaton overhead. For numercal reasons that wll be made clear n the next chapter, we want to keep the determnants ordered. We have therefore defned the operator < as ϕ a1 ϕ a2,..., ϕ anp < ϕ b1 ϕ b2,..., ϕ bnp only f 2 2M a M a M a N P < 2 2M b M b M b N P, (14.3) 79

80 and we defne the determnants { D n1, D n2,... } to be sorted when D n < D n+1. In the rest of ths thess, when we say that a determnant s greater or lesser than another determnant or that a set of determnants are sorted, we are referrng to ths equaton The sngle partcle bass We keep 2M spn orbtals n our bass, each dentfed by ts ndex [0, 2M 1]. Each spn orbtal ϕ s unquely represented by the trplet (m, s, n), where m s the magnetc quantum number, s s the spn and n s the prncpal quantum number. Gven the ndex, the trplet (m, s, n) s stored n the followng way: (): ϕ has spn f s odd and f s even. (): The quantum numbers n and m are stored n two arrays of length M. nt *p_m = new nt[m]; nt *p_n = new nt[m]; The magnetc quantum number of orbtal s stored as the floor(/2) th element n p_m, and the prncpal quantum number s stored as the floor(/2) th element n p_n The state vector The state vector s represented by the sgned populaton of walkers that populate dfferent determnants. Assume that N D s the number of populated determnants. We store the determnants as the N D frst elements of a btset array. btset<n> *pb_determnants = new btset<n>[_ndmax]; And the sgned populaton of walkers s stored as the N D frst elements of a long array. long *pl_populaton = new long[_nwmax]; Here the populaton of pb_determnants[] s stored as pl_populaton[]. The number of populated determnants N D wll vary from one teraton to the next, but usually only a fracton of the total number of determnants n the bass are populated. So every teraton these arrays have to be rearranged, such that the populated determnants always are stored as the N D frst elements. 80

81 Chapter 15 A sngle FCIQMC teraton The am of ths chapter s to explan how the sequental FCIQMC algorthm has been mplemented. We wll do ths by gong through a sngle teraton of the algorthm, whch ncludes the kll/clone, spawn and annhlaton step. Assume that we start wth a populaton of walkers {n 0, n 1,..., n ND } where n s the sgned populaton of the determnant D. As we already have dscussed, the populaton s represented by two vectors pb_determnant[m] D m, (15.1) pl_populaton[m] n m. (15.2) We further assume that the determnants are ndexed and sorted such that D m < D n f m < n. Durng an FCIQMC teraton, we loop through all determnants D α, α = 1, 2,..., ND, (15.3) where each determnant s constructed from the orbtals {ϕ α1,..., ϕ αnp } D α = ϕ α1,..., ϕ αnp. (15.4) A: The kll/clone and spawn steps: For each element we perform the followng steps: 1: Calculate the kllng/clonng probablty: We frst calculate and store the dagonal matrx element D α Ĥ D α whch s gven by Eq. (15.6) D α Ĥ D α = p_dp D α Ĥ D α, (15.5) N P α ĤHO ϕ α + =1 ϕ 1 N P 2 j=1 The kll/clone probablty s now calculated accordng to Eq. (8.11). d_pd = d_dt*(d_pd-ds); Here d_dt τ s the tmestep and d_ds S s the shft. ϕ α ϕ αj ˆV ϕ α ϕ αj AS. (15.6) 81

82 2: Loop over all walkers on D α : We then enter a loop where we do the kll/clone and spawn step for all walkers on the current determnant D α. We let and enter the loop: for (; _loopw>0; _loopw--) _loopw n α, (15.7) 2a: Perform the spawnng step: We attempt to spawn walkers on a connected determnant D β. As we dscussed n chapter 10, we let D β be a sngle exctaton of D α wth probablty d s and a double exctaton wth probablty (1 d s ). We draw a random unform χ [0, 1] and f χ < d s we try to spawn walkers on a determnant and else we try to spawn walkers on a determnant D β = ( 1) P a pa k D α, (15.8) D β = ( 1) P a pa qa k a l D α, (15.9) where k, l, p, q are sampled accordng to the rules that we set up n chapter 10, and P s the number of permutatons that are necessary to rearrange the annhlaton and creaton operators. We spawn n spawned new walkers, where and p d (β, α) s calculated accordng to Eq. (8.12) n spawned = floor ( p d (β, α) + 1). (15.10) p d (β α) = τ D α Ĥ D β /p gen (β α). (15.11) Here p gen (β α) s the samplng probablty of determnant D β from determnant D α whch we also dscussed n chapter 10. The off dagonal matrx element D α Ĥ D β s calculated accordng to Eq. (4.29) D α Ĥ{a pa k } D α = N P j=1 n the case of a sngle exctaton and Eq. (4.30) ϕ p ϕ αj ˆV ϕ k ϕ αj AS, (15.12) D α Ĥ{a pa qa k a l } D α = ϕ p ϕ q ˆV ϕ k ϕ l AS, (15.13) n the case of a double exctaton. If a spawnng event s successful, we store the newly spawned walkers on the arrays: pb_new and ppl_new. The frst s an 1 n max btset array whch stores the determnant D β, and the second s a 2 n max long array that stores the sgned number of new walkers, and the nteger n max s a preset value that 82

83 sets the sze of the array. The newly spawned walkers and whether ther parents were ntators, are stored accordng to the followng rules: pb_new[l_nnew] D β (15.14) f D α s an ntator: (15.15) ppl_new[0][l_nnew] n spawned, ppl_new[1][l_nnew] 0, else f D α s a non-ntator: ppl_new[0][l_nnew] 0, ppl_new[1][l_nnew] n spawned, (15.16) l_nnew l_nnew + 1. (15.17) Here l_nnew s an nteger that we use to count the number of newly spawned walkers. Ths parameter s set to zero at the begnnng of each FCIQMC teraton. Note that accordng to the spawnng rules of -FCIQMC, we need to know f the parent of a walker s an ntator, the total number of non- ntators tryng to spawn on the same determnant and whether D β was prevously populated before we can say f the spawnng event was successful. Consequently, we do not know whether a spawnng event was successful before after an teraton. 2b: Perform the kll/clone step: We try to kll or clone a walker wth the probablty d_pd P d (α α). Remember that d_pd s already calculated and stored, and s equal for all walkers on the current determnant. We now attempt to kll or spawn walkers accordng to the followng rules: frst draw a random unform χ [0, 1] (15.18) f d_pd > χ: pl_populaton[α] = sgn(n α ) floor( d_pd + 1), (15.19) else f d_pd > χ: pl_populaton[α] + = sgn(n α ) floor( d_pd + 1). (15.20) where n α s the sgned populaton on D α. B: The annhlaton step: When we have run through the prevous steps for all determnants, we must annhlate all pars of walkers of opposte sgns that populate the same determnant. We have chosen to do ths n the followng way: We frst sort the arrays that contan the newly spawned walkers pb_new, ppl_new. The determnants array are sorted n ascendng order, and the populaton arrays are reordered n exactly the same way. Note that these arrays mght contan equal determnants, n whch case we sum the populatons and remove the duplcate. For example, t mght happen that pb_new[] = pb_new[j], (15.21) 83

84 and n ths case we sum ppl_new[0][] + = ppl_new[0][j], (15.22) ppl_new[1][] + = ppl_new[1][j], (15.23) remove pb_new[j], pb_new[0][j] and pb_new[1][j] from the smulaton and subtract one from l_nnew (remember that l_nnew s the total number of spawnng events). After ths step we end up wth a sorted determnants array pb_new[] < pb_new[+1] for all [0, l_nnew-2], (15.24) and where ppl_new[0][] and ppl_new[0][] contan all walkers spawned on pb_new[] from ntators and non- ntators respectvely. The next step s now to merge the arrays of old and newly spawned walkers. We make use of temporary arrays pb_temp and pl_temp, and copy both the old and new determnants to these arrays. Ths must be done accordng to the rules of the ntator adapton of FCIQMC. Ths part of the algorthm s most easly understood by readng the C++ code whch we have pasted below. Here l_ndet s the number of elements n the arrays that stores the old determnants and l_nnew s the number of elements n the sorted lst of newly spawned walkers. long = 0, j = 0, n = 0, l_temp; whle ( (<l_ndet) && (j<l_nnew) ) { f <... pb_determnants[] == pb_new[j]... > { pb_temp[n] = pb_determnants[]; //old determnants pl_temp[n] = pl_populaton[]; // here all attempts to spawn s accepted snce the // determnant was prevously populated pl_temp[n] += ppl_new[j][0]; pl_temp[n] += ppl_new[j][1]; ++; j++; } else f <... pb_determnants[] > pb_new[j]... > { l_temp = 0; // ntators are always allowed to spawn l_temp += ppl_new[j][0]; // non ntators are only allowed to spawn f // the absolute value of the sum of new walkers // s larger than one f ( abs(ppl_new[j][1]) > 1 ) { l_temp += ppl_new[j][1]; } pb_temp[n] = pb_new[j]; pl_temp[n] = l_temp; j++; } 84

85 } else \\( <... pb_determnants[] < pb_new[j]... > ) { //old determnants pb_temp[n] = pb_determnants[]; pl_temp[n] = pl_populaton[]; ++; } // f the populaton s larger than 0, accept the determant f ( pl_temp[n]!= 0 ) { n++; } Now, ether all old walkers or all new walkers are coped to the temporary arrays. The next step s to copy the rest of the walkers: f (==l_ndet) { whle (j<l_nnew) // all old walkers are already coped { pb_temp[n] = pb_new[j]; pl_temp[n] = ppl_new[j][0]; f ( abs(ppl_new[j][1])>1 ) { pl_temp[n] += ppl_new[j][1]; } j++; f ( pl_temp[n]!= 0 ) { n++; } } } else f ( j==l_nnew ) { whle ( <l_ndet) // all new walker are already coped { pb_temp[n] = pb_determnants[]; pl_temp[n] = pl_populaton[]; ++; f ( pl_temp[n]!= 0 ) { n++; } } } Now, all determnants are contaned n the n frst elements of the temporary arrays. We fnsh by swappng the ponters of the temporary arrays and the populatons and determnants arrays, and reset the number of walkers and determnants. 85

86 //reset number of determnants l_ndet = n; l_nnew = 0; //Ponter swappng btset<n> *pb_sswap = pb_determnants; pb_determnants = pb_temp; pb_temp = pb_swap; long *pl_swap = pl_populaton; pl_populaton = pl_temp; pl_temp = pl_swap; C: Varyng the shft: The last step s to vary the shft S to stablze the populatos. We assume that the old number of walkers s already stored as l_oldnumwalkers. The new number of walkers l_numwalkers can be calculated: l_numwalkers = 0; for (nt =0; <l_ndet; ++) l_numwalkers += abs(pl_populaton[]); The new shft s now calculated accordng to Eq. (8.16). 86

87 Chapter 16 Parallelzaton 16.1 Introducton Parallelzaton allows smulatons on multcore computers or clusters of nterconnected computers or nodes. There are two man models of parallel computng, the shared memory model and the dstrbuted memory model, whch are based on dfferent underlyng memory archtectures. Shared memory programs communcate by changng shared memory varables and dstrbuted memory programs communcate va message passng. On multcore computers, we can choose whether to use a shared or a dstrbuted memory model. On computatonal clusters, where the memory s only accessble to the cores on each node, a dstrbuted memory model must be chosen. When we mplemented FCIQMC, we found that t was necessary to use a composte approach combnng the shared and the dstrbuted memory models. For smulatons wth a large set of bass functons, the amount of memory needed to store the Coulomb matrx elements s so large that at least some of the cores have to share memory. Dstrbuted memory parallelzaton was mplemented because we wanted the code to run on computatonal clusters wth many nodes. FCIQMC s n prncple a hghly paralellzable algorthm. The most tme consumng part of the algorthm s the clone/kll and spawn steps whch must be calculated for a large number of walkers. These calculatons are ndependent of each other and can therefore be run n parallel wthout any communcaton between the processes. The annhlaton step does however nclude a lot of communcaton and therefore nduces a sgnfcant communcaton overhead. In ths chapter we wll dscuss these ssues along wth other detals of the parallelzaton of algorthm Numercal lbrares We have used the lbrary OpenMP[21] whch s an nterface for shared memory parallelzaton. More specfcally, t s an mplementaton of multthreadng whch s a method where several lghtweght processes or threads are started and allocated to dfferent cores. OpenMP has been mplemented usng a coarse graned model where the threads only share a mnmum of objects and varables, as for example the Coulomb matrx elements and the lst of walkers. Ths was done to mnmze the number of locks and mutex es (crtcal and atomc clauses n OpenMP) that are necessary to avod conflctng read and wrte operatons to the same memory locatons. 87

MPI s an abbrevaton for Message Passng Interface and s a lbrary specfcaton whch provdes an nterface for synchronzaton and communcaton between processes wth dstrbuted memory.

88 MPI s an abbrevaton for Message Passng Interface and s a lbrary specfcaton whch provdes an nterface for synchronzaton and communcaton between processes wth dstrbuted memory. The MPI standard defnes the syntax and the semantcs of a number of lbrary routnes and has several mplementatons, among others OpenMPI[7] whch we have used n our code The dstrbuton of walkers on the MPI tasks Due to the annhlaton step n the FCIQMC algorthm, we need a fast method to search through the populaton and fnd all walkers that resde on any gven determnant, and ths becomes more dffcult when we run the code wth many MPI tasks. Ths problem can be solved by dstrbutng the determnants n a specfc way whch we wll now dscuss. We assume that the determnants are enumerated such that D < D +1, (16.1) where the operator < s defned n Eq. (14.3), and that the walkers are dstrubuted on multple MPI tasks whch are enumerated 1, 2, 3,.... The determnants { D 1, D 2,... }, are sorted and dstrbuted such that only walkers that populates determnants D wth n some nterval [M(n 1) + 1, M(n)] populates MPI task n. The array V = ( D M(1), D M(2),..., D nd ), (16.2) now keeps the nformaton about where a determnant, f t s populated, s stored. To make ths nformaton accessble to all processes we dstrbute V to the MPI tasks. Fgure 16.1: Ths fgure shows schematcally how the determnants are dstrbuted on the MPI tasks. The determnants are sorted such that all determnants on MPI task are larger than D M( 1) and smaller than or equal to D M(). 88

89 16.4 Load balancng The process of dstrbutng the walkers such that the workload s the same on all threads s referred to as load balancng. All threads have to wat for the last thread to fnsh at the end of an teraton, whch means that the numercal effcency wll suffer f the computatonal load s not evenly dstrbuted. The load balancng of the OpenMP threads and the MPI tasks follow the same prncples, and n ths secton we wll refer to both the MPI tasks and the threads as processes. Each process receves a number N () () W of walkers that populates a number N D of determnants. Ideally, the walkers should be dstrbuted n such a way that all processes spend the same tme processng the walkers. However, ths s not straght forward snce the CPU tme needed to process one walker s dependent on many dfferent factors. We have assumed that the computaton tme for a set of N () () W walkers lvng on N D determnants s t CP U (N () W, N () D ) k W N () W + k DN () D + const, (16.3) where k W, k D are mplementaton and system dependent parameters. We thnk ths s a reasonable assumpton snce the number of calls to the most costly functons, whch are the functons that calculate the kll/clone and spawn probabltes, are lnearly dependent of N () () D and N W. Accordng to Eq. (16.3), the walkers should deally be dstrbuted to the processes =, 1, 2, 3,..., N procs n such a way that N (a) W k W + N (a) N procs D k D = =1 (N () W k W + N () D k D)/N procs, a [1, N procs ]. (16.4) The constants k W and k D are found emprcally. In practce we set k W = 1 and optmze k D. Ths parameter s system dependent, and the optmal value should deally be found for each new smulaton The program flow In ths secton, we wll look at the work flow and how the walkers are admnstrated by the parallelzed program. Together wth the last chapter where we descrbed how the sequental algorthm s mplemented, ths secton should gve a more or less complete pcture of how we have mplemented the FCIQMC algorthm. We do not provde programmng detals or code examples here, but these can be found n the next chapter where we dscuss the C++ code. In fgure 16.2, one teraton of the code s schematcally llustrated n four steps A,B,C and D, and these steps are explaned n detal below. A: We start wth a certan number of walkers on each MPI task. The walkers populate the determnants D 1, D 2,..., whch are sorted and dstrbuted on the dfferent MPI tasks. The MPI tasks are load balanced n accordance wth Eq The frst step s, on each MPI task, to dstrbute the walkers to the threads n such a way that they are load balanced. B: Each thread performs the clone/kll and spawn steps on ts lst of walkers by usng the sequental algorthm. 89

90 C: After the prevous step, all MPI tasks now have two lsts of walkers. The frst s the old lst of walkers mnus the klled walkers and plus the cloned walkers. These walkers are stuated on the correct MPI task snce the cloned walkers populates the same determnants as ther parents. The second lst conssts of the newly spawned walkers whch populates dfferent determnants than ther parents. These walkers must be sent to the MPI task where ther resdent determnants are lsted, sorted and merged wth the old lst of walkers. As we dscussed n the last chapter, the newly spawned walkers are stored n the arrays pb_new and pl_new. And t s the elements of these arrays that must be redstrbuted. After the redstrbuton, the walkers are merged wth the old populatons and determnants arrays n the same way as we descrbed n the last chapter 1. D: The number of walkers and determnants on the dfferent MPI tasks may now have changed. The MPI tasks are load balanced by redstrbutng the determnants. Because of the way the determnants are sorted, only the frst/last determnants are transmtted, and only to or from the last/next MPI task. Except for the addtonal steps that must be taken to redstrbute the newly spawned walkers and load balance the nodes and the threads, the parallel algorthm s more or less dentcal to the sequental algorthm. 1 We dscussed ths n the last chapter under pont B (the annhlaton step). 90

91 91 Fgure 16.2: Ths fgure s explaned n the text n secton 16.5

92 Chapter 17 Organzaton of the code and the classes In ths chapter we wll look closer at the C++ code, and show how the code s structured. Note that wth excepton of the man functon, there s no part of the code that s not part of a class. In ths chapter each of the classes has ts own secton where we descrbe the functonalty of the class, what classes t nstantates and so on Overvew The code s organzed n the followng classes runsmulaton: Ths class contans the man loop and coordnates the other classes. walkercontanerclass: Keeps nformatons about the populaton of walkers. sortwalkers: Sorts the new walkers and merges them wth the lst of old walkers. loadbalancethreads: Dstrbutes the walkers to the threads. walkerdstrbuton: Dstrbutes the walkers to the MPI tasks. walkerpropagator: Runs through a lst of walkers and performes the kllng clonng and spawnng steps. hamltonanelements: Calculates the hamltonan elements D Ĥ D j. lbgrie: Tabulates and stores the Coulomb matrx elements. nputvars: Reads and stores the preset run parameters. ntsmulaton: Intalzes the bass and some run tme parameters. In addton, the lbrary OpenFCI [15] are used to calculate the Coulomb ntegrals, and the lbrary CRandomMersenne [6] to generate random numbers. The random number generator that we have used s an mplementaton of the Mersenne Twster algorthm [18] whch s consdered to be a fast algorthm that produces random numbers of hgh qualty. The smulator s wrapped n a python code that comples the C++ code, starts smulatons, organzes, analyzes and plots the output data etc. 92

93 Fgure 17.1: UML Class Dagram. Some classes are not ncluded lke ntsmulaton, CRandomMersenne, OpenFCI and nputvars. 93

94 17.2 The ntsmulaton class Ths class ntates the lbgre class whch contans the Coulomb matrx elements and the arrays p_m and p_n whch contans the quantum numbers of the sngle partcle orbtals. It also sets the seed 1 of the random number generator and some other run tme parameters The runsmulaton class Ths class coordnates the other classes and contans the man loop of the program. The classes walkercontanerclass and lbgrie are nstantated before we enter the man loop, and are shared between all threads. We then enter the functon run whch can be summarzed as follows: (I): A number _numthreads of threads are spawned and the program enters the parallel secton. (A): The class walkerpropagator are nstantated wthn each thread. Each of these classes receves a ponter to the lbgrie object to be able to access the shared Coulomb matrx elements. (B): Each thread allocates memory for ntermedate storage of newly spawned walkers. Ths s done to prevent data races among the threads. (C): The program enters the man loop (): Each thread calls the walkerpropagator::snglestep() functon to do a sngle FCIQMC teraton. (): The newly spawned walkers are wrtten to the shared arrays ppd_new and pb_new of the class walkercontanerclass. (): The program enters a sequental secton where only one of the threads does work (a): The walkercontanerclass object redstrbutes the new walkers to the dfferent MPI task. It then loadbalance the MPI tasks, and the threads of the ndvdual tasks. (b): Statstcs are collected from all MPI tasks and wrtten to an output fle The walkercontanerclass class Ths class holds four publc arrays and one publc vector that stores all nformaton about the current populaton of walkers, the newly spawned walkers and the load balancng of the threads. It nstantates the classes walkerdstrbuton and sortwalkers, whch sorts the walkers, redstrbutes the walkers on the dfferent MPI tasks and merge the lst of old and new walkers. The state vector s represented by two arrays. The frst s a btset array whch stores the populated determnants, and the second s a long array whch stores ther populatons. 1 The random number generator s ntalzed wth an nteger that s called a seed. The sequence of random numbers that are produced by the random number generator depends on the value of the seed. 94

95 btset<n>* pb_determnants = new btset<n>[_ndmax]; long* pl_populaton = new long[_nwmax]; The varable l_ndet tells us how many dfferent determnants that have a non zero populaton. These determnants are always stored n the l_ndet frst entres of pb_determnants and pl_populaton. The newly spawned walkers are stored on the two arrays btset<n>* pb_new = new btset<n>[_maxnspawn]; long** ppl_new = new long[_maxnspawn]; Assume a spawnng event where n walkers are spawned on the determnant D j from the determnant D, and that ths s the _nnew th spawnng event. The sgn of n can be postve or negatve dependng of the sgn of the spawned walkers. The nformaton about the spawnng event s stored n the followng way: pb_new[_nnew] D j. If D s an ntator: If D s a non-ntator: ppl_new[_nnew][0] n ppl_new[_nnew][1] 0 ppl_new[_nnew][0] 0 ppl_new[_nnew][1] n These arrays must be avalable when the new lst of walkers are merged wth the old lst of walkers. Often, a walker s spawned on a determnant that s lsted on a dfferent MPI task, and there s no way of fndng out f the spawnng event was successful untl the new walkers are redstrbuted. The loadbalancng of the threads are calculated by the class loadbalancethreads and the nformaton about how the walkers are dstrbuted s stored on the vector vector<loadbalthr> wdstthr; whch has the same number of elements as the number of OpenMP threads. wdstthr[] contans the nformaton about whch walkers that are processed by thread and s an nstance of the struct struct loadbalthr { unsgned long frst; unsgned long ndet; long nwfrst_thrd; long nwlast_thrd; }; where frst s the ndex of the frst determnant to be processed, and ndet s the number of determnants. The walkers on the frst and the last determnant on a thread can be splt between two threads. nwfrst_thrd and nwlast_thrd s the number of walkers on the frst and the last determnant that should be processed by the current thread. wdstthr s reset every tme the functon loadbalance s called. 95

96 17.5 The sortwalkers class Ths class sorts the spawned walkers and merges the lsts of new and old walkers. The functor LessThan fnds the smallest of two determnants. Ths s a class where the operator () s overloaded wth functon that compares two determnants gven ther ndces. class LessThan { prvate: btset<n> b_tmp; unsgned nt l_tmp; btset<n>* pb_dets; publc: LessThan (btset<n>* pb_dets) { ths->pb_dets = pb_dets; } bool operator() (const unsgned long lhs, const unsgned long rhs); }; The comparson of the determnants can be effcently performed by usng bt operatons. The next example shows how the xor operator (represented by ^ n C++) can be used to fnd the smallest of the two determnants b_lhs and b_rhs. bool operator()(const unsgned long lhs, const unsgned long rhs); { b_tmp = (pb_dets[rhs]^pb_dets[lhs]); l_tmp = b_tmp._fnd_frst(); f (l_tmp==n) return false; else f (pb_dets[rhs].test(l_tmp)) return false; else return true; } Here _Fnd_frst() returns ether the ndex of the frst bt equal to 1, or N f all bts are 0, where N s the template parameter of the btset objects whch equals the number of spn orbtals n the bass. The functon lhs.test(k) returns true f the k th bt of lhss set, and false otherwse. The class LessThan s nstantated wth a ponter to the frst frst determnant n pb_determnant or pb_new. Ths allows us to test f determnant s smaller than determnant j usng the followng notaton LessThan lessthan(&pb_determnant[0]); bool b_lessthan = lessthan(,j); Instead of sortng the btset array drectly, we now sort an array ul_ndex wth ordered ndces n the range [1,..., N D ], where N D s the number of determnants that are to be sorted. The order of he sorted ndces tells us how to reorder the determnants afterwards. Ths s faster and more practcal for many reasons. Most mportantly, when N s large t s much more tme consumng to sort the btset array drectly smply because more data needs to be moved around. 96

97 The functon orderindces ntalse ul_ndex array and sort t by usng the C++ Standard Template Lbrary functon std::sort 2. for (unsgned long =0; <n; ++) ul_ndex[] = ; std::sort(&ul_ndex[0], &ul_ndex[n], LessThan(pb_new) ); Note that we pass the functor LessThan as an argument. Ths s a method to overload the operator < wth the operator () of the LessThan class. The functon reorderwalkers reorder the newly spawned determnants after the array ul_ndex s sorted. Determnants that s represented more than once are merged and the ther populatons are summed. To do the reorderng of the arrays effcently, we copy the determnants array and the populatons array to an ntermedate array and swap ponters. The functon mergenewandold merges the old lst of determnants and a lst of sorted walkers accordng rules of ntator-fciqmc. We use ntermedate arrays and ponter swappng n ths functon as well The walkerdstrbuton class Ths class take care of the dstrbuton of walkers and determnants on the dfferent MPI tasks. Most of the MPI communcaton s handled here. As dscussed earler, the determnants are sorted and dstrbuted on the dfferent MPI tasks. Every teraton, the newly spawned walkers are sent to the MPI task where ther host determnant s lsted. To know where to sent the walkers, each MPI task must be aware of whch determnants that are lsted on whch MPI task. Ths nformaton s, as we dscussed n secton 16.3, contaned n the array btset<n>* pb_lastdet = new btset<n>[_numthreads]; whch contans the greatest determnant on each of the MPI tasks. After the spawnng step, each new walker s sorted accordng to whch determnant t populates. When the walkers are sorted, we only need fnd the frst and the last determnant of the nterval that we want to transmt to each of the MPI tasks. For ths purpose we use the STL bnary search functon std::lower_bound. Smlar to what we dd n the last secton, we overload std::lower_bound wth a functor that defnes the < operator. Ths class also redstrbutes the determnants when the MPI tasks are load balanced The loadbalancethreads class Ths class s responsble for the load balancng of the threads. The functon loadbalance fnds the dstrbuton of walkers accordng to Eq. (16.4). 2 std::sort s an mplementaton of the so called ntrosort algorthm [19], whch s a fast sortng algorthm wth a worst case scalng of O(N log(n)). 97

98 17.8 The walkerpropagator class Ths class performs a sngle FCIQMC step, and s nstantated on all threads. Each object of ths class receves the lst of walkers that s dstrbuted to the threads by the loadbalance class. It calculates the spawnng, kllng and clonng step accordng to the FCIQMC algorthm. The functon snglestep runs through the lst of determnants. For each determnant the followng steps are performed: (:) Call the class hamltonanelements and calculate the dagonal matrx element D Ĥ D. (:) Call the functon updatepojectedenergy to accumulate the projected energy f D 0 s connected to D. (:) Run through the lst of all walkers j on determnant (a) Assume that the preset probablty of a sngle exctaton s d s. If U < d s try to do a sngle exctaton Else try to do a double exctaton. where U s a random unform (0, 1). Then call sampleprojector whch draws the new determnant a pa r D or a pa qa r a s D and calculates the suggeston probablty, and trysngleexctaton or trydoubleexctaton whch calculates the Coulomb matrx elements and the exctaton ampltude, rolls the dce and spawns new walkers. (b) Try to kll or clone a walker. Draw a random unform U [0, 1] and calculate d p = τ( D Ĥ D S). If D Ĥ D < 0 then remove the walker f D Ĥ D > U. Else f D Ĥ D > 0 then add a walker f D Ĥ D > U. Addng or removng a walker s done by addng to or subtractng from the th element of the populatons array. We wll take a closer look at the functon sampleprojector. Ths functon draw a random determnant that s connected to D, and calculate the suggeston probablty. For a sngle exctaton we call double walkerpropagator<n>::sampleprojector( const nt act_w, nt &a1, nt &c1) Where act_w s the ndex of the determnant, and a1,c1 s changed to the ndces of the annhlated and created orbtals. For a double exctaton the functon s overloaded wth two more ndces a2 and c2. Frst we pck a random orbtal to be annhlated and store t as a1 98

99 a1 = o_random.irandomx(1,num_part); //fnd the poston of the th set bt = b_slater_actve._fnd_frst(); for (j=1;j<a1;j++) = b_slater_actve._fnd_next(); a1 = ; where _Fnd_frst() and _Fnd_next() are hghly optmzed functons of the btset class that fnds the frst bt or the frst bt after the th bt that s set. Then we construct a btset b_slatertemp whch contan all orbtals that preserve the total magnetc quantum number and the spn //fnd the magnetc quantum number of the new state ma = p_m[a1/2]; //the spn must be the same as that of the annhl. orb f (a1%2) //c1 s spn down (1) b_slatertemp = pbamc_dwn[ma+r]&~b_slater_actve; else //c1 s spn up (0) b_slatertemp = pbamc_up[ma+r]&~b_slater_actve; The btsets pbamc_up[ma+r] / pbamc_dwn[ma+r] contans all orbtals wth a magnetc quantum number ma and spn up / down. These btsets are ntalzed when the class s nstantated. b_slater_actve s the actve determnant D and p_m s the array contanng the magnetc quantum number of the orbtals. We count the number of orbtals n ths set. Ths number must be known to calculate the suggeston probablty. dm_s5 = b_slatertemp.count(); f (dm_s5==0) {return -1;} Then we pck a random orbtal from ths set. //pck random from S5 c1 = o_random.irandomx(1,dm_s5); //fnd the poston of the th set bt = b_slatertemp._fnd_frst(); for (j=1;j<c1;j++) = b_slatertemp._fnd_next(); c1 = ; return statc_cast<double>(num_part*dm_s5); The suggeston probablty for the sngle exctatons s the return value dvded by the predetermnad probablty of a sngle exctaton p s. The same strategy s used to fnd the doubly excted determnants and ther suggeston probablty The hamltonanelements class Ths class calculates the elements of the Hamltonan matrx, and s nstantated by the walkerpropagator class. It communcates wth the lbgrie class that holds the Coulomb matrx elements. 99

100 17.10 The lbgrie class Ths class tabulates the Coulomb matrx elements usng the ndexng scheme that s descrbed n secton 11. Ths class s only nstantated once on each MPI task. The reason why the lbgrie object s shared among the threads s that the amount of memory needed to store the Coulomb matrx elements can become several GB for large R. If each thread where to store ts own lbgrie object the memory footprnt of the program would become very large. The class varables are not changed after the ntalzaton. Ths means that several threads can access the class methods smultanously wthout rskng the occurrence of race condtons. Thus, sharng an object of the class between the threads wll not slow down the code The nputvars class Ths class reads runtme parameters from a text fle and stores them as class varables. Instead of passng the runtme parameters around n the program, we pass an object of ths class. 100

101 Chapter 18 Benchmarkng the Code In ths chapter we have tested the effcency of the code, both sequentally and wth OpenMP and MPI, and whch effect dfferent run tme parameters have on the executon tme Hotspots by CPU usage The hotspots are the most CPU ntensve parts of the code. These are the parts of the code that we would gan the most from optmzng. To fnd the hotspots we have performed a tests on a sngle workstaton. MPI was not used n ths test, but the scalng wth MPI s revewed separately later n ths chapter. (:) class walkerpropagator 38% of total CPU tme of whch 68% was spent n the functon sampleprojector. whch calculates the generatonal probabltes p gen for the sngle and the double exctatons. (:) class CRandomMersenne 30% of total CPU tme. Ths s the random number generator generatng random unforms. (:) class lbgrie 20% of total CPU tme of whch close to 100% was spent n the functon getcoulie. Ths functon fnds the Coulomb matrx elements gven the ndces (p, q, r, s) of the annhlated and created orbtals. (v:) class hamltonanelements 7% of total CPU tme of whch 94% was spent n the functon getsngleexctatonampltude whch calculates the ampltudes D Ĥ {a pa r } D. 6% was spent n the functon getdagonalelement whch calculates D Ĥ D (v:) The remanng CPU tme was manly spent watng because the threads had dfferent workloads. The sortwalkers class and the loadbalancethreads class spent less than 1% of the total run tme. 101

102 As we see, the calculaton of the generatonal probabltes p gen, the generaton of random unforms and accessng the Coulomb matrx elements s the most tme consumng part of the code. The sampleprojector method s beleved to be reasonably fast, and t s dffcult for us too see how ths functon could be further optmzed. It s however clear that the overall performance of the code would mprove much from such optmzatons, and alternatve samplng methods mght be an nterestng subject for future studes. The random number generator s a hghly optmzed mplementaton of the well known Mersenne-Twster algorthm, whch we beleve to be a good choce for our mplementaton. Although faster algorthms exsts, ths one does not compromse on the qualty of the random numbers. We have also nvested much effort n optmzng the method getcoulie that returns the Coulomb matrx elements, and we do not know any other method that s faster than ours for ths purpose Scalng wth openmp and MPI Because of the seral sectons of the code, the computatonal effcency wll only ncrease up to a certan number of nodes and cores. The total run tme can not be less that the executon tme of the seral part of the code, and the performance gan of addng an extra node falls rapdly close to ths lmt. Ths relatonshp s descrbed by Amdahls law whch states that f the seral part of the code spends s of the total run tme, the speedup wth n processes s n/(1 + s[n 1]) whch converges to 1/s for large n. The walkers on a sngle determnant can be splt between threads but not between MPI tasks. Ths means that the best possble speedup for a system wth N W walkers n total and n m walkers on the most populated determnant s of order N W /n m. If we ncrease the number of MPI tasks beyond roof[n W /n m ], we wll not see any speedup. Ths means that a smulaton wll always be lmted by the most populated MPI task wth n m walkers, and that smulatons wth many walkers on few determnants should be run wth as many threads as possble per MPI task. The speedup as a functon of the number of OpenMP threads and as a functon of MPI tasks s shown n Fg Ths plot shows that the MPI mplementaton s effcent and ndcates that the communcaton overhead s small. But even f the system whch we have smulated here scales well wth MPI, ths result does not necessarly apply to all systems. There are many factors whch affect the amount of communcaton between the nodes and the communcaton overhead. For example, we would expect less communcaton wth systems that are weakly correlated snce the spawnng probabltes on average are smaller (Hamltonan matrx closer to dagonal form). Lkewse, f the fluctuatons n the dstrbuton s small, we would expect less communcaton snce the determnants would have to be redstrbuted less often. These characterstcs are system dependent and dffcult to predct. Among the tunable parameters whch affect the amount of communcaton s the tme step τ, the determnant load parameter k D and the redstrbuton threshold. The tme step affects the walker dynamcs, and there exsts an optmal parameter from a samplng perspectve. But when we run the code on many nodes, the tme step s proportonal wth the spawnng rate, and should therefore be kept low to mnmze the communcaton overhead. The optmal values for τ, k D and the redstrbuton threshold are system dependent, and 102

103 should deally be optmzed for each new system. (a) Fgure 18.1: (MPI) Wall tme of smulatons wth a dfferent number processes or MPI tasks. All MPI tasks run on dfferent nodes to maxmze the communcaton overhead. (OpenMP) CPU tme of smulatons wth a dfferent number of cores. All runs performed on a sngle node. (Ideal scalng) The best possble parallel scalng s n 1, where n s the number of MPI tasks or threads Scalng wth the number of walkers As we have argued earler (see secton 16.4), we would expect the run tme of the part of the code that handles the kll/clone and spawn steps to scale lnearly wth the number of walkers and the number of determnants. The scalng of the part that handles sortng, annhlaton and redstrbuton of walkers s more dffcult to predct, but s expected to have a worst case scalng of O(N log(n)) 1 wth both the number of newly spawned determnants and the total number of determnants. However, the dependency of the number of determnants on the number of walkers s system dependent and not generally known. Furthermore, ths part of the code s not parallelzed wth OpenMP, and could therefore have a large negatve mpact on the overall run tme. Our experence from smulatons s that gven that the number of nodes are kept constant, the computaton tme scales close to lnearly wth the number of walkers. Ths s llustrated n Fg 18.2 where we have plotted the run tme of smulatons of dfferent systems as a functon of the number of walkers. 1 The sort algorthm has a worst case scalng of O(N log(n)) where N s the number of determnants that are to be sorted. We use C++ Standard Lbrary functon std::sort whch s an mplementaton of the ntrosort algorthm [19]. 103

104 Fgure 18.2: Scalng of the algorthm when the number of walkers are ncreased for systems wth a dfferent number of partcles N P, number of shells R + 1, spn s and magnetc quantum number m. All systems have an oscllator frequency ω = 1 and are run wth the same number of teratons. As we see from ths fgure, the run tme t seems to have a lnear dependency of the number of walkers The determnant load parameter and the redstrbuton threshold The CPU tme of smulatons wth dfferent values for the determnant load parameter k D s llustrated n Fg The optmal value for ths parameter dffers from system to system, thus t should deally be tuned for each new system. But as ths s a very tme consumng task we have used a fxed value k D = 1.4 n most smulatons. When t comes to the redstrbuton threshold, we have found that usng a value 10 2 s effcent for the systems that we have smulated. We have used a redstrbuton threshold of 0.03 for all smulatons. 104

105 Fgure 18.3: Dfferent smulatons where all runtme parameters are the same except the load weght parameter k D. Ths parameter determnes how the nodes are load balanced as defned n Eq. (16.4). As we see, the CPU tme s at ts mnma at k D 1.4. Ths number s however system dependent, and deally each new system should be mnmzed wth respect to k D to optmze the numercal speed. 105

106 Chapter 19 Modfyng the code to smulate other systems Our code s talored to smulate two dmensonal quantum dots, but t s straght forward to modfy t to handle other systems lke atoms, molecules or nucle. In ths short chapter we are lookng closer on how ths can be done. Assume that we want to apply the FCIQMC code to an arbtrary system. To do so, we need to know the Hamltonan matrx n the bass of the determnants { D } whch are constructed from the sngle partcle wave functons { ϕ }. In addton we need to keep track of the quantum numbers n, m, s,... of the sngle partcle wave functons. To be able to sample the spawnng step effcently, we must also know whch quanttes that are conserved n an nteracton D Ĥ D j. These propertes dffers from one quantum mechancal system to the next, and the code has to be modfed f the quantum numbers or the conservaton rules are dfferent from the two dmensonal parabolc quantum dots whch we have smulated. Only a few functons and classes wll have to be modfed or rewrtten to smulate dfferent many body systems. As we already have dscussed, we store the quantum numbers on arrays p_n and p_m. If the new system have addtonal quantum numbers, more such arrays would have to be ntroduced, and the classes walkerpropagator and hamltonanelement would have to undergo some small changes. In addton we would have to change the followng classes and functons: 1. The functon walkerpropagator::sampleprojector. Ths functon samples the suggeston probabltes (see: chapter 10) and has to be rewrtten f the new system has dfferent symmetres. Generally, only determnants D, D j where certan quanttes are conserved are connected, and only connected determnants should be sampled when we try to spawn new walkers. 2. The class hamltonanelement. Ths class has to be rewrtten to calculate the Hamltonan matrx elements of the new system. 3. The class lbgrie. Ths class must be replaced wth a class that stores or calculates the Coulomb matrx elements (or mode generally the nteracton matrx elements) of the new system. 106

107 4. The class ntsmulaton. Ths class ntates the arrays p_m and p_n whch contans the quantum numbers and has to be rewrtten. All the classes that admnstrates the walkers (walkercontanerclass, loadbalancethreads, sortwalkers and walkerdstrbuton), and the runsmulaton class would be left unchanged. 107

108 Part VI Results 108

109 In ths part we dscuss the results of a seres of smulatons wth the FCIQMC algorthm and the optmzed algorthm -FCIQMC. We test the algorthm, dscuss ts advantages and dsadvantages and compare t to other numercal methods, and we have also made a few predctons for dfferent open shell systems. The frst chapter s devoted to testng and valdatng the program, and we show that our code reproduces known results wth a hgh accuracy. In the next two chapters, we look at how the code performs wth dfferent systems wth a varyng degree of correlaton and a dfferent number of partcles. In the fourth and ffth chapter we look closer at dfferent methods to optmze the algorthm.more specfcally, we study the convergence of the results of smulatons wth - FCIQMC, wth both the plan harmonc oscllator bass and wth a Hartree- Fock bass. In the sxth chapter we calculate the extrapolated energes of dfferent open shell systems and compare the results to Dffuson Monte Carlo (DMC) resuls. We also dscuss the valdty of the extrapolaton formula and dscrepances between our results and the DMC energes. At last we summarze and dscuss our results together wth what we beleve are nterestng furure prospects. 109

110 Chapter 20 Valdatng the code 20.1 Introducton To valdate the code we have found some cases where we know the correct outcome of a smulaton. In some cases there exst closed form expressons whch provdes us wth the exact result and n other cases we have compared wth values from other numercal methods whch ought to gve the same outcome. We know that the FCIQMC energy should converge to the Full Confguraton Interacton (FCI) energy wthn a gven Hlbert space. An obvous way to valdate the code s therefore to compare our results wth FCI energes. We have compared our energes to the results of Refs. Kvaal 2006 [15], Rontan et. al. (2006) [29] and Olsen (2012) [20] whch have performed FCI smulatons wth the same Hlbert spaces as we study. In the case of the two partcle quantum dot, the closed form expresson of the energy for certan values of the oscllator frequency ω s known [37]. For example, the two partcle ω = 1 quantum dot has a ground state energy of exactly 3 Hartree (H). Because of the relatvely small dmensonalty of the Hlbert spaces of ths system, we can do smulatons wth a hgh number of shells and a hgh accuracy. The stuaton s dfferent for the ntator adapton of the algorthm, -FCIQMC. We do not even know the closed form expresson of the underlyng projecton operator, and consequently t s dffcult to fnd any good test cases. The only method we have found to test our mplementaton s to check that the energes actually converge to the FCIQMC value n the lmt of a hgh number of walkers and a low ntator lmt FCIQMC A smple system, two electrons n two shells The frst test to valdate the code was to compare wth the FCI ground state energy of the two partcle system n two shells wth an oscllator frequency ω = 1. In two shells, the sngle partcle bass conssts of only sx spn orbtals {ϕ 1, ϕ 2, ϕ 3,..., ϕ 6 }, (20.1) and the Hlbert space s spanned by the three determnants whch have a total spn and magnetc quantum number equal to zero H = { ϕ 1, ϕ 2, ϕ 3, ϕ 6, ϕ 4, ϕ 5 } = { 1, 2, 3 }. (20.2) 110

111 The Hamltonan matrx s a 3 3 matrx wth the elements H,j = Ĥ j,, j [1, 3], (20.3) The numercal value of the matrx elements s generated wth OpenFCI, and we fnd the ground state energy to be H, a result whch s reproduced wth our code Comparng FCIQMC energes and FCI energes The next test was to compare FCIQMC energes to the results of Refs. [15, 20, 29] whch have used FCI to calculate ground state energes of parabolc quantum dots. In Tab. 20.1, the results of a small seres FCIQMC runs are lsted and compared wth FCI results. The FCI results was n most cases reproduced wth a small error n the range of 0.6 mh or less. The excepton s the smulatons of the 5 partcle quantum dots wth R = 5, where the FCI energes are sgnfcantly lower. Note that the results stll agree to wthn 1 mh, and that the FCI results for the same system are reproduced wth R = 7. When the dfferences are so small, t s dffcult to dscuss them further wthout an error estmate for the FCI results. All Refs. have used the Lanczos algorthm, whch s an teratve procedure where a trdagonal matrx whch s smlar to the Hamltonan matrx s constructed. The egenvalues of ths matrx are expected to converge to the energy spectrum of the Hamltonan. When the dfference between the estmated ground state energy from one teraton to the next s smaller than some preset value δ k, the smulaton s assumed to have converged. But δ k does not set an upper bound for the error. Olsen(2012) [20] demonstrated that for some systems the energy seems to converge to one value before t suddenly drops and converges to the real ground state value. As an example, for the four partcle quantum dot wth m = s = 0 and ω = 1/ 6, Refs. [15, 29] found a ground state energy of approxmately H whle the true value s H lower at H. In ths case, a δ k smaller than 10 7 H was needed for the smulaton to converge to the lower value (whch was the same groundstate energy as the FCIQMC algorthm found for ths system). The convergence crtera for the energy δ k s only provded by Olsen [20], whch used a value of 10 6 H/ω 2 n most smulatons. Although t s possble to estmate the Lanczos error [8], only Olsen has dscussed ths subject, and none of the references have ncluded the errors of ther results. R = 5 R = 5 R = 5 R = 7 R = 7 R = 7 N ω m 2s S E p FCI S E p FCI (6) (1) ( ) (4) (1) ( ) 2 1/ (2) (1) ( ) (2) (1) ( ) 3 1/ (4) (8) ( ) (5) (9) ( ) 3 1/ (3) (2) ( ) (3) (2) ( ) 4 1/ (5) (4) ( ) (5) (6) ( ) 4 1/ (4) (3) ( ) (5) (5) ( ) 5 1/ (1) (1) ( ) (1) (1) ( ) 5 1/ (4) (1) ( ) (4) (2) ( ) Table 20.1: Here and m, s and ω are the magnetc quantum number, the spn and the oscllator frequency, and R+1 the number of shells. S and E p are the statstcal and generatonal estmator, and energes are n unts of Hartree. The FCI energes are provded by [15] ( ), [29] ( ) and [20] ( ). Note that the number of dgts of the FCI energes does not reflect the accuracy of the results. 111

112 Testng FCIQMC energes aganst analytcal results In the artcle by Taut [37], t s shown that the ground state energy of the two partcle quantum dot wth ω = 1 s exactly 3H. We have performed a seres of smulatons wth R = 5, 10, 15, 20, 25 and 30 to check whether the energy converges towards the exact result. The energy s mproved every tme we ncrease the number of shells, and wth R = 30 the energy s approxmately 2mH too hgh. Also note that, as wll be dscussed later, the extrapolated energy (usng Eq. (5.11)) yelds an energy (1)H. The results are lsted n Tab and llustrated n Fg R S E P (6) (1) (6) (2) (7) (2) (8) (2) (4) (1) (3) (1) Table 20.2: Smulatons wth two partcle quantum dots wth spn and magnetc quantum number m = s = 0 and dfferent number of shells R + 1 n the bass. All lsted energes have unts of Hartree. The exact result n the lmt R s 3 Hartrees FCIQMC To ensure that the -FCIQMC energy has converged, a seres of smulatons must be run wth an ncreasng number of walkers. When both S and E P are converged to the same energy, we can assume that the energy has converged to the FCIQMC value. Based on the results of Ref. [4], we would also expect ths to happen wth a number of walkers that s smaller that N C. We have observed ths behavour for all -FCIQMC smuatons that we have performed. As an example we wll look at the 5 partcle quantum dot wth ω = 1 and 2s = m = 1 n 9 shells. We have used the FCIQMC result wth partcles as a benchmark for the energy. The FCI space conssts of determnants and the crtcal number of walkers s approxmately As we see from Fg. 20.1, FCIQMC must be run wth at least the crtcal number of walkers to converge whle the -FCIQMC energes converges at a much lower number of walkers. Already at 10 4 walkers the -FCIQMC energy s wthn a few mh from the FCIQMC result, and as the number of walkers s ncreased the errorbars gets smaller. 112

113 (a) N I = 1 (b) N I = 2 (c) N I = 3 Fgure 20.1: Smulatons for systems wth 5 partcles, 9 shells, magnetc quantum number and spn m = s = 1 and a Hlbert space wth a dmensonalty dm(h ) = The crtcal number of walkers s N C Here S s the generatonal estmator and E P s the projected estmator whle conv s the exact FCI value at (5)H. Here we have used results from an FCIQMC smulaton wth walkers as the exact energy. All smulatons are run wth teratons. 113

114 Chapter 21 The statstcal estmators In ths secton we wll dscuss the behavour of the generatonal estmator S and the projected estmator E P. We have ncluded results from smulatons of dfferent 2 partcle systems wth 8 shells and dfferent oscllator frequences ω. Although these systems have a very small Hlbert space wth only 104 determnants n the bass, the statstcal estmators show the same general behavour as we have seen for other systems. We have run 4 smulatons wth 10 4 walkers whch s well above the crtcal number of walkers that s necessary for the energes to converge. The value for ω s set to dfferent values, 10.0, 1.0, 0.1 and 0.01, otherwse all parameters are the same. The results are lsted n Tab and as we see, both the projected estmator E p and the generatonal estmator S converges to the FCI value. Frst, notce that the relatve error (measured as a fracton of the total energy) of both estmators gets larger for smaller ω. Ths s a general trend n our smulatons, and mght be explaned as a consequence of the degree of correlaton n the systems. For hgh ω the system s weakly correlated, and exctatons wll be less probable. In ths case, most of the walkers wll presumably be concentrated n a smaller part of the Hlbert space. In a strongly correlated system, the walkers wll presumably be more spread out n the Hlbert space, and one can say (wth a lttle sloppness) that we are samplng a larger number of determnants wth the same number of walkers. From Tab we also see that the projected estmator E p gves the best energy estmate for large ω, whle the generatonal estmator S does for small ω. Ths s llustrated n Fg where the fluctuatons E p gradually become larger than the fluctuatons of S as we decrease ω. There may be several reasons why the statstcal estmators have a dfferent dependence of the oscllator frequency ω. Frst, the projected estmator does only depend on walkers that populates determnants that are connected to the reference determnant D 0 whle the generatonal estmator depends on the total populaton. Ths means that for most systems, the projected estmator are calculated on bass of a subset of the populaton. (Ths does not apply to the two partcle system where all determnants are connected to D 0 ). Second, the general estmator s senstve to the total number of walkers whle the projected estmator s senstve to the shape of the dstrbuton. And thrd, the projected estmator s extra senstve to the number of walkers n 0 on the reference determnant D 0 snce t appears n the denomnator n the expresson for the projected energy. If n 0 gets to small, we would expect the projected estmator to have a large error. It should be commented that systems where the D 0 component of the ground state s small, may requre a very large 114

115 populaton to acqure a suffcently large n 0. ω n 0 / N W ε S /ε Ep FCI [20] S E p (40) (2) (6) (1) (10) (5) (3) (6) Table 21.1: The rato of the populaton on the reference determnant to the total populaton n 0 / N W and rato of the error of the generatonal estmator to the statstcal estmator ε S /ε Ep ncrease as we decrease the oscllator frequency ω. Thus the generatonal estmator wll gve a better estmate of the energy than the projected estmator when ω s small. FCI energes were only avalable for ω = 1 and 0.1. All energes are n unts of Hartree. Havng two dfferent estmates for the energy s valuable as t heghtens our confdence n the results. In fact, n the case that the estmators does not agree we can assume that the smulatons are not converged. The estmators are only, as we can see from ther defntons, expected to gve the same result n the case that the dstrbuton of walkers resembles the ground state of the gven Hlbert space. 115

wth dfferent oscllator frequences ω. All energes are n unts of Hartree.

116 (a) ω = 10 (b) ω = 1 (c) ω = 0.1 (d) ω = 0.01 Fgure 21.1: The nstantaneous value of the projected energy E p (τ) and the shft S(τ) durng teratons for smulatons wth dfferent oscllator frequences ω. All energes are n unts of Hartree. The fluctuatons n E p (τ) becomes larger relatve to the fluctuatons n S(τ) as we decrease ω. All smulatons are wth two partcle quantum dots wth spn and magnetc quantum number s = m = 0 and wth eght shells n the bass. 116

A how to guide to second quantization method.

A how to guide to second quantization method. Phys. 67 (Graduate Quantum Mechancs Sprng 2009 Prof. Pu K. Lam. Verson 3 (4/3/2009 A how to gude to second quantzaton method. -> Second quantzaton s a mathematcal notaton desgned to handle dentcal partcle