Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998 Introducton Most standard textbook approaches to solvng systems of lnear equatons or dagonalzng matrces are descrbed as drect methods, and they typcally requre a fxed number of mathematcal operatons whch depends on the dmensons of the problem. These methods generally requre access to matrx elements n random order, whch poses serous dffcultes for the very large matrces typcally encountered n computatonal quantum chemstry: random access of large dsk fles becomes prohbtvely expensve, and often the matrces are too large even to store on dsk! In such cases, one may avod the need for random access to ndvdual matrx elements by turnng to teratve technques, whch requre only the repeated evaluaton of matrx-vector products. Unfortunately, teratve methods are not guaranteed to converge, and they can have dffcultes when the matrx s not dagonally domnant or when there are nearly degenerate solutons. The well-known Davdson method [1] for the teratve soluton of the lowest few egenvalues and egenvectors of large, symmetrc matrces combnes some of the features of drect and teratve technques. Although only matrx-vector operatons are requred, and there s no need to explctly store the Hamltonan matrx, Davdson s method also uses drect methods to dagonalze a small Hamltonan matrx formed n the subspace of all tral CI vectors that have been consdered up to the present teraton. The current estmates of the egenvalues of the full Hamltonan matrx are obtaned as the egenvalues the small Hamltonan matrx, and the current CI vectors are obtaned as the lnear combnatons of the tral vectors whose coeffcents are gven by the egenvectors of the small Hamltonan matrx. Pople and co-workers later used related deas to teratvely solve the large systems of lnear equatons occurng n the coupled-perturbed Hartree-Fock method [2]. In 1980, Pulay publshed a somewhat smlar method [3] known as the drect nverson of the teratve subspace (DIIS). Lke the Davdson method, DIIS apples drect methods to a small lnear algebra problem (now a system of lnear equatons nstead of an egenvalue problem) n a subspace 1
formed by takng a set of tral vectors from the full-dmensonal space. Pulay found that DIIS could be useful for acceleratng the convergence of self-consstent-feld (SCF) procedures and, to a lesser extent, geometry optmzatons. The Mathematcs of DIIS Suppose that we have a set of tral vectors {p } whch have been generated durng the teratve soluton of a problem. Now let us form a set of resdual vectors defned as p = p +1 p. (1) The DIIS method assumes that a good approxmaton to the fnal soluton p f can be obtaned as a lnear combnaton of the prevous guess vectors p = c p, (2) where m s the number of prevous vectors (n practce, only the most recent few vectors are used). The coeffcents c are obtaned by requrng that the assocated resdual vector p = ( c ) p (3) approxmates the zero vector n a least-squares sense. Furthermore, the coeffcents are requred to add to one, c = 1. (4) The motvaton for the latter requrement can be seen as follows. Each of our tral solutons p can be wrtten as the exact soluton plus an error term, p f + e. Then, the DIIS approxmate soluton s gven by p = ( c p f + e ) (5) = m p f c + c e. Hence, we wsh to mnmze the actual error, whch s the second term n the equaton above (of course, n practce, we don t know e, only p ); dong so would make the second term vansh, leavng only the frst term. For p = p f, we must have m c = 1. 2
Thus, we wsh to mnmze the norm of the resduum vector p p = c c j p p j, (6) j subject to the constrant (4). These requrements can be satsfed by mnmzng the followng functon wth Lagrangan multpler λ where B s the matrx of overlaps L = c Bc λ ( 1 ) c, (7) B j = p p j. (8) We can mnmze L wth respect to a coeffcent c k to obtan (assumng real quanttes) L = 0 = c j B kj + c B k λ (9) c k j = 2 c B k λ. We can absorb the factor of 2 nto λ to obtan the followng matrx equaton, whch s eq. (6) of Pulay [3]: B 11 B 12 B 1m 1 c 1 0 B 21 B 22 B 2m 1 c 2 0 = (10) B m1 B m2 B mm 1 c m 0 1 1 1 0 λ 1 Programmng DIIS The DIIS procedure seems so smple that further comment on specfc computatonal mplementatons mght appear superfluous. However, I have found that the precse computatonal detals are absolutely crucal for effectve nterpolaton. Hence, I wll descrbe here my mplementaton of DIIS for the optmzaton of orbtals n a two-step CASSCF program. There are probably many varatons on ths mplementaton whch would also work, but often seemngly nconsequental chnages can make dramatc dfferences n effcency. In the two-step CASSCF procedure, one begns wth a set of guess orbtals, solves the full CI problem n the actve space, determnes the gradent for orbtal rotatons, takes a step n orbtal 3
rotaton (theta) space down the gradent drecton (.e., obtans new guess orbtals), and repeats the process teratvely untl convergence. To allow DIIS nterpolaton, one can express the current set of guess orbtals as the result of the multplcaton of a set of Gvens rotaton matrces by a matrx of reference orbtals (C µp = q C o µqu qp, see [4]). The rotaton angles whch defne the untary transformaton U (a product of Gvens rotaton matrces) comprse a vector of parameters, p. In ths case, one can defne the error vectors as the dfferences between subsequent sets of orbtal rotaton angles, or one could also reasonably choose the orbtal gradent vector. In my detcas program, the regular theta step s determned usng a Newton-Raphson approach wth an approxmate, dagonal orbtal Hessan. Ths s equvalent to scalng the orbtal gradent to a new coordnate system. Snce the step n theta space s just the scaled gradent, the scaled gradent s the same as the dfference between successve theta vectors (apart from a sgn) before the DIIS procedure starts. However, I have found t much better to assocate the gradent vector wth the next teraton s theta vector, not wth the theta vector from whch t was computed. In other words, t s best to change eq. (1) to the followng: p +1 = p +1 p. (11) Another general consderaton s that one does not want to add an nterpolated vector to the lst of vectors {p} unless t contans some new character to add to the subspace. Otherwse, lnear dependences can result. An outlne of my DIIS procedure for the detcas program s gven below: 1. Usng current orbtals p, obtan scaled orbtal gradent g. 2. Take Newton-Raphson step p +1 = p g. 3. Add p +1 to lst of vectors. Add p +1 = g to lst of error vectors. 4. Perform DIIS nterpolaton to obtan new guess vector. Overwrte p +1 wth DIIS nterpolant. (Ths vector wll never be added to lst of vectors). 5. Increment, begn new cycle. 4
References [1] E. R. Davdson, J. Comp. Phys. 17, 87 (1975). [2] J. A. Pople, R. Krshnan, H. B. Schlegel, and J. S. Bnkley, Int. J. Quantum Chem. Symp. 13, 225 (1979). [3] P. Pulay, Chem. Phys. Lett. 73, 393 (1980). [4] M. Head-Gordon and J. A. Pople, J. Phys. Chem. 92, 3063 (1988). 5