Session 2: Solving the SCF equations Daniel Lambrecht Department of Chemistry University of Pittsburgh Quantum Chemistry Workshop May 7th 2014 Pittsburgh, PA Self-Consistent Field (SCF) Procedure Calculate F(C) FC = eSC from Atkins/Friedman Choices for the “initial guess” core Hamiltonian = “zero guess” Superposition of Atomic Densities (SAD) Extended Hückel Theory (EHT) … Solving the generalized eigenvalue problem FC = SC✏ S S 1 2 1 2 1 2 FC = S 1 2 FS 0 SC✏ 1 2 1 2 S C = S C✏ 0 0 FC =C✏ diag(F’) oscillations. A given density matrix Dn gives a Fock matrix Fn, which, upon diagonalization, gives a density matrix Dn+1. The Fock matrix Fn+1 from Dn+1 gives a density matrix Dn+2 that is close to Dn, but Dn and Dn+1 are very different, as illustrated in Figure 3.5. The damping procedure tries to solve this by replacing the current density matrix with a weighted average, D′n+1 = wDn + (1 − w)Dn+1. The weighting factor w may be chosen as a constant or changed dynamically during the SCF procedure. Converging the HF solutions Density D 0 F0 D 2 F2 D4 F 4 Converged value Iterations D 1 F1 1. 2. 3. 4. 5. D 3 F3 D 5 F5 Figure 3.5 An oscillating SCF procedure Extrapolation (3) Level shifting. This technique is perhaps best understood in the formulation of a rotation of the MOs that form the basis for the Fock operator (Section 3.6). At Damping convergence, the Fock matrix elements in the MO basis between occupied and virtual orbitals are zero. The iterative procedure involves mixing (making linear Level shifting combinations of) occupied and virtual MOs. During the iterative procedure, these mixings may be large, causing oscillations or making the total energy increase. The Direct inversion in the iterative subspace (DIIS) degree of mixing may be reduced by artificially increasing the energy of the virtual orbitals. If a sufficiently large constant is added to the virtual orbital energies, it Direct minimization can be shown that the total energy(DM) is guaranteed to decrease, thereby forcing con13 vergence. The more the virtual orbitals are raised in energy, the more stable is the ∑ ( ) ci = 1(E n +1= n +1c) E erged solution has an error of zero, and the DIIS method forms a linear comn +1 iteri i hifts, convergence is guaranteed, but it is likely to occur very slowly, and density matrices (F , F , F , . . . and D , D , D , . . .) are produced. At each 0 1 2 0 1 2 n ground andindicators maywe in that, some cases converge to state that is not the state. ion of the error in ato least squares sense, is aaminimum (as close try find the point with lowest error, which iis =0 not necessari i = 0 ation, it is also assumed that an estimate of the “error” (E , E , E , . . .) is available, 0 E1 me cases to a state that is not the ground state. =2 ci Eni was developed by ro as possible). In the function space generated by the previous iterations n +1procedure (4)converge Direct inversion in the iterative subspace (DIIS). This points actually calculated. It is common to use the trace (sum of diago i.e. how far the current Fock/density matrix is from the converged solution. The 3.8 SCF TECHNIQUES i = 0 14 ry to find the point with lowest error, which is not necessarily one of the n in the iterative subspace (DIIS). This procedure was developed by 1 efficient in co cinormalization =very P. Pulay and is an extrapolation procedure. It has proved to be Minimization of the ErrF subject to the converged solution has an error of zero, and the DIIS method forms a linear comof the matrix product the error matrix with itself as a scalar in n tsan actually calculated. Itprocedure. is common to14use the trace (sum of of diagonal elements) i =0 extrapolation It has proved to be very efficient in forcing convergence and itself in reducing the number iterations the same time, bination of the error indicators that, in aa least squares sense, isof a minimum (asat close e matrix product of the error matrix with as scalar indicator of the 1 c = Lagrange method (Section 12.5), and leads to the i n followi error. gence and in reducing the number of iterations at the same time, to zero as now possible). space generated previous iterations and it is one In of the thefunction most commonly used for helping SCF conver. ithe =0 ErrF Minimization ofbymethods the subject to the normalization * F = ci Fi n normalizat where l is the multiplier associated with the ne of the we most commonly used methods for helping SCF convertry to find the point with lowest error, which is not necessarily one of the gence. The idea is as follows. AsLagrange the iterative procedure runs, a sequence of Fock method (Section 12.5), and leads to the follo i = 0 ErrF c = trace E ⋅ E ( ) ( ) ErrF c = trace E ⋅ E ( ) ( ) n + 1 n + 1 n + 1 n + 1 Minimization the ErrF to the normalization constraint i pointsAs actually calculated. It is commonruns, toofuse the tracesubject (sumof of Fock diagonal elements) a is as follows. the matrices iterative procedure a sequence and density (F , F , F , . . . and D , D , D , . . .) are produced. At each iter0 1 2 where l is0 the 1 multiplier 2 n associated with the normaliz n of the matrix product of the error matrix with itself as a scalar indicator of the method 12.5), leads to the following set of l a11the aeach L a − 1 c 0 The(Section only remaining question is the nature of the trices (F0, ation, F1, F2,it. .is. Ealso and , iD are produced. At iterEiLagrange 12 and 1 n 1 0c 1, Dthat 2, . . .) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ n +1 = D assumed an estimate of “error” (E , E , E , . . .) is available, ∑ 0 1 2 error. En +associated (3.64)ci E iwith the normalization. i =0 1= where l is the multiplier ∑ ∑ ∑ Direct inversion in the iterative subspace (DIIS) ∑ ∑ ∑ a12 L −1⎞ ⎛ cThe assumed that an estimate the “error” (E0, E is from available, the difference FDS SDF (S isathe overlap 11 − converged 1solution. n 1 ⎞ matr ⎛ athe ⎛ ⎟0 1, E 2, . . .)is i.e. how farn the of current Fock/density matrix ⎜ ⎟ ⎜ ⎟ ⎜ i =0 L a a a − 1 c 0 21 22 2 n 2 ErrF c = trace E ⋅ E ( ) ( ) n + 1 n + 1 e current Fock/density is from the of converged solution. ent the SCF energy withforms respect to ⎜ aThe ⎟⎞⎟⎜com⎟ coe ⎜ ⎟0 ci matrix =1 L a⎞2a⎟n linear −01the cMO converged∑solution has an error zero,of and the DIIS method 21 a1na22 −1 2 ⎜ a a L c ⎜ ⎜ n 11 12 1 ⎛ ⎞ ⎛ ⎛ n i =0 ⎜com⎟ ⎜close ⎟ M⎜ tion has an error of zero, and the DIIS method forms a linear = to work well in practice. A closely related method M M O M M M bination of the error indicators that, in a least squares sense, is a minimum (as En +1 = ∑ ci Ei ⎜ ⎜ a21 ci a=221 L ⎟ ⎜ c 2 ⎟M⎟ ⎜15 ⎜ 0M ⎟⎟⎟⎜ M ⎟⎜ = ⎜ ⎟ M a2n M −1O ⎜ mization of the subject normalization constraint isminimum handled by(as the (3.64) error indicators intoa the least squares sense, is aspace close i =0 cator, and has the acronym to ErrF zerothat, as possible). In the function generated by the previous ⎜ ⎟EDIIS. ⎜ ⎟⎟ ⎜⎜iterations ⎟⎟⎜ ⎟⎜ ⎜ ⎟ i = 0 ⎜ ⎜ ⎟ aL L a1⎟nn= ⎜c−Mn1⎟ cn 0 0 ange method (Section 12.5), and leads to then following set of linear equations, a a − M M O M M n 1 M ana 2nn n 1 n 2 ⎜ ⎟ ⎜ sible). In the function space generated by the previous iterations (5) “Direct minimization” techniques. we try associated to find the with lowest error, which ⎜is not necessarily⎟ The one ⎟variationa of ⎜ the ⎟⎜ ⎜ ⎟ 1 c = ⎜ ⎜ ⎟ e l is the multiplier with point the normalization. i ∑ ⎝ −normalization ⎠⎟⎠⎝ − l ⎠⎝ is⎝ ⎠h ⎟ ⎜function ⎟1⎠constraint ⎜elements) 1 − 1 L − 0 − Minimization of ErrF subject the ato L a − 1 c the pointpoints with lowest error, which isis common not necessarily one of the ⎝⎜⎜−ause ⎝ n1 1 the n− 2 trace nn n i =0Itthe to minimize the energy as a of the MO actually calculated. to (sum of diagonal 1 L − 1 0 l 1 − − ⎟⎜ ⎟ ⎜ ⎟ a a L a − 1 c 0 11 12 1 n 1 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ calculated.Minimization is common to ErrF usemethod the (sum of diagonal elements) ⎝ −1matrix ⎠given ⎝following ⎝ −eq. ⎠of Lagrange 12.5), and leads the set of lin aijas trace E E =−handled ⋅indicator ofItthe matrix product oftrace theto(Section error matrix with itself a1toscalar the ( density elements, as (3.54). In − 1 L 0 l 1 i− j⎠) by of the subject the normalization constraint is by the ⎜ a21 a22 L a2n −1⎟ ⎜ c 2 ⎟ ⎜ 0 ⎟ ) product ofLagrange the matrix with itself a no scalar of(ofE the ij = trace i ⋅ E normalization. jequations, error. where l is the with the method (Section and toassociated theaaindicator following set linear different from other types of non-linear opti ⎜ error ⎟ ⎜12.5), ⎟multiplier ⎜ as ⎟leads Ac = b ij = trace(E i ⋅ E j ) = M M O M M M M ⎜ l is the multiplier ⎟associated ⎜ ⎟ ⎜ with ⎟ where the normalization. technique, such as steepest descent, conjugate −1 ⎜ an 1 an 2 L ann −1⎟ ⎜ cn ⎟ErrF ⎜ 0 (⎟ c ) = trace Ac = b ⋅ Ena (En=+L =1A b1 ⎞ ⎛ 0 ⎞ Ac 1b +11)n c − a a c 11 12 ⎛ ⎞ ⎛ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ be used (see Chapter 12 for details a11 ⎠⋅ E a12 ⎠ )L⎝ a⎠1n −methods 1⎞n ⎛ c1 ⎞ ⎛ can 0 ⎞ (3.65) ⎛ ⎝ −1 (c− ⎝ ErrF E ) 1= trace ( − 1 L −1 n +01 −nl+1 −1 −b1 c = A In iteration n the matrix (n + 1) × ⎜ ⎟ ⎜ has ⎟ dimension ⎜ 0the ⎟ variational c = A b ⎜ a21 a22 L Ea2n a=21 ⎟As ⎜ cca222 ⎟mentioned ⎜L ⎟ aA in Section 3.6, − 1 c − 1 0 2 n 2 E n n +1 SCF∑ aij = trace(Ei ⋅ E j ) ⎜ 3.8 103 ⎟TECHNIQUES ⎜ i ⎟ i20. ⎜ The ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ less than coefficients c can be obtained b (3.64) terms of an exponential transformation of the M i = 0 = M M O M M M M In iteration n the A matrix has dimension (n + 1) × (n + 1), wh E E = c i ⎜i ⎟ ⎜ M⎟ ⎜O⎟ = ⎜ M (n M M M M Ac =nb+1 ∑In ⎜ ⎟ ⎜ ⎟ ⎟ by iteration n the A matrix has dimension + 1) × (n n matrix and multiplying it onto the b vector, i.e. in the “si (3.64) ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ i = 0 ational parameters contained in an X matrix. N less than 20. The coefficients c can be obtained directly a a L a − 1 c 0 n n −1 n1 n2 nn c n= A b ⎜cFai *=n 1=1linear ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ an⎟2 equations ⎜L⎟ ann are (3.65) − 1 c 0 the solved by “direct inversi ∑ n ci⎠Fover (3.66) less⎝ −1than 20. The coefficients c can be obtained by ⎠∑ ⎝− ⎝ −onto ⎠ thethe matrix and multiplying b vector, i.e. in the “subspace” of MO coefficients in eq. (3.54) for n ferred i it −1 L − 1 0 l 1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ i = 0 eration n the A matrix + 1), where n usually is coefficients that minimize the e 1 dimension (n + 1) × (nHaving ci =has i =0 obtained the ∑ ⎝ ⎠“direct ⎝vector, ⎠ inversion”, ⎝− ⎠orthonormal) the linear equations are solved by thus t not independent (the MOs must be − 1 − 1 L − 1 0 l 1 − a trace E E = ⋅ ( ) matrix and multiplying it onto the b i.e. in the “sub ij i j than 20. The coefficients c can be obtained by directly inverting the A i =0 the same set of coefficients isthe used for error generating a Minimization the Having ErrF subject toas the normalization constraint isenergy handled by functio the in obtained the coefficients that minimize the Theit only remaining question is the nature of the error function. Pulay suggested a series expansion, and expanded ix and multiplying onto the bof vector, i.e. in the “subspace” of the “iterations” Aclinear =b the equations are solved by “direct inversion a trace E E = ⋅ ( ) ij i j (F*) at iteration n, which is used in place Fn for16 Lagrange method (Section 12.5), and leads to the following set of linear equations, flinear the ErrF subject to the normalization constraint is handled by the the difference FDS − SDF (S is the overlap matrix), which is related to the gradiequations are solved by “direct inversion”, thus the DIIS. the−1same set of is used for generating an of extrapolat ingcoefficients the name occupied–virtual mixing of the orbitals. ∑ Computational cost & reduced-scaling approaches Computer time Space Introduction Introduction Scaling vs. prefactor (a)(a) (b)(b) 5) 5 O(M O(M ) 3) 3 O(M ) O(M O(M) O(M) Molecule size (M) Molecule size (M) Computation time Computation time Computation time Computation time 2 2 2.7 M 2.7M 2 2 MM 2.7 MM 2.7 MM Molecule Moleculesize size(M) (M) Direct SCF: Motivation Conventional: Store integrals to disk #ERIs = N4 N4/8 (permutation symmetry) 100 BFs 800 MB 100 MB 1000 BFs 7.5 TB 0.9 TB Prohibitive for large systems Direct SCF (“D-SCF”) Idea: Recompute integrals as needed Advantages: Circumvent storage bottleneck What else? Computational cost of SCF Computationally dominant steps: Fock build J, K Diagonalization or Energy minimization diag(F) E[P] = min decay exponentially with distance. Therefore, the number of basis functions wn decay exponentially with distance. Therefore, the number of basis functions wn overlapping with asymptotically(for (forlarge large molecules) overlapping withthe thefunction function wwmm will will asymptotically molecules) remain constant with size(in (ina away wayone one imagine remain constant withincreasing increasing molecular molecular size cancan imagine a a ‘‘sphere’’ around asshown shownininFigure Figure Overall ‘‘sphere’’ aroundthe theselected selected basis basis function function as 3).3). Overall there areare OðMÞ basis-function eachofofthe thetwo two electrons, there OðMÞ basis-function pairs pairs describing describing each electrons, so so 2 2 a total OðM two-electron integrals integrals results: thatthat a total of of OðM Þ Þtwo-electron results: Basis function products ðð 11 ðr Þw ðr Þ w 1 1 m 1 Þwnnðr1 Þ ðr wm|fflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflffl}ffl} rr12 12 |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl OðMÞ OðMÞ wl ðr 2 Þw 1 dr 2 2 w Þwssðrðr2ffl}2Þ Þdrdr 1 dr l ðr2ffl{zfflfflfflfflfflfflfflffl |fflfflfflfflfflfflfflffl |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} OðMÞ ½11$½11$ OðMÞ Figure 3 Illustration of the basis functions-pair domain behavior. For a given basis function (or shell) m, only those n’s must be considered whose overlap integrals Direct SCF: Coulomb (J)µ⌫ = X P (µ⌫| ) DO m = 1, N O(N ) DO n = 1, N DO l = 1, N O(N ) DO s = 1, N Jmn += Psl * gmnls ENDDO s ENDDO l ENDDO n ENDDO m µ!⌫ ! Cost: 2 O(N ) Integral storage: O(1) More tricks possible … O(N ) Schwarz Integral Estimates 2 Although the asymptotic OðM Þ scaling of the four-center two-electron Integral screening 20 integrals had been known at least since 1973, it was only in 1989 in the 21 seminal work of Ha¨ ser Methods and Ahlrichs that an efficient and widely accepted 10 Linear-Scaling in Quantum Chemistry way of rigorously preselecting the numerically significant two-electron integrals was introduced: it not only avoids the bottleneck of storing the huge number of four-center 1 1 two-electron integrals, but$also it becomes possible the two-electron 2 % jðlsjlsÞj 2 ¼to jðmnjlsÞj jðmnjmnÞj Qscreen ½12( mn Qls integrals in combination with the corresponding one-particle density matrix P available in each iteration. For screening example, when calculating theupper Coulomb part Schwarz inequality This so-called Schwarz integral provides a rigorous bound to (Jmnfour-index ), integrals integrals, are neglected if their contributions to the Coulomb matrixtwoare the while requiring just the computation of simple !# below a selected threshold of 10 : index quantities. In this way, small four-index integrals below a chosen threshold can be neglected and the formal M4 scaling associated with the formation !# 2 neglect ðmnjlsÞ; if jP j $ Q Q % 10 ½14' ls to OðM mn lsÞ. As we will see shortly, of the Fock matrix in HF theory is reduced this was a breakthrough for direct SCF methods19,21,22 and increased the An analogous screening criterion may be formulated for the exchange matrix applicability of SCF methods dramatically. Kmn . In contrast to computing the two-electron integrals before an SCF calcuIt has to be pointed out that in nondirect SCF, the density matrix is not lation, the key feature of the direct SCF method is the recalculation of twoavailable for screening, because all integrals are calculated prior to the SCF electron integrals in each SCF iteration. Although this recalculation has the run. Equation [12] has to be employed for screening instead of Eq. [14]. Direct SCF: Exchange (K)µ⌫ = P X P (µ | ⌫) µ! = e ⌘r for insulators n = 1/r for metals r!1 O(N ) DO m = 1, N DO n = 1, N O(1) DO l = 1, N O(1) DO s = 1, N Kmn += Psl * gmsln ENDDO s ENDDO l ENDDO n ENDDO m O(N ) ! !⌫ Reduced scaling techniques: Fock build Formal h J K diag(F) O(N 2 ) 4 O(N ) 4 O(N ) 3 O(N ) AO P CFMM sparsity sparsity GvFMM O(N ) O(N ) 2 O(N ) O(N ) O(N 2 ) O(N ) O(N ) 3 3 3 O(N ) O(N 2 ) O(N ) 2 O(N ) O(N ) O(N ) Scaling with system size Reduced scaling techniques: Fock build RI / Cholesky 2 h J K O(N ) diag(F) O(N 3 ) 4 O(N ) O(N 4 ) O(N 2 ) 3 O(N ) O(N 3 ) Scaling with basis set size ðnÞ ðnÞ$ II ¼ h þ P F are given by F ¼coupling h þ P with $ IIthe density matrix. discarded due to the missing Awith further improvement ontwo-electron integral screening can beofachieved by em II as the antisymmetrized integrals. Instead constructing ðn!1Þ ðn!1Þ ðnÞiteration, ðnÞ ¼ h þ P $ II F the full Fock matrix in each a recursive scheme asmatrices in the following difference densities (c.f. Ref. 19 and 21). The Fock of iterat F ¼ h þ P $ II Incremental Fock builds equation may be used: n ! 1 are given by ðn!1Þ ðn!1Þ F ¼ hðn!1Þ þP $ II antisymmetrized two-electron integrals. Instead F ¼F þ !P $ II ½16' of ðnÞ ðnÞ F ðnÞ ¼ h þ P $ II matrix in eachdensity iteration, recursive scheme as in with the difference !P for theanth iteration defined as the antisymmetrized two-electron integrals. Instead of con ðn!1Þ ðn!1Þ ¼hþP $ II F yock be matrix used: in each iteration, a recursive scheme as in the ðnÞ ðnÞ ðn!1Þ ½17' !P ¼ P ! P ðnÞ ðnÞ hmay II asbe theused: antisymmetrized two-electron integrals. Instead of constr Within this scheme, the number of two-electron integrals needed for the ðnÞ ðn!1Þ a recursive ðnÞ scheme as in the foll fullFock Fock matrix in each ðnÞ iteration, matrix updates each iteration may be$ screened by replacing F !PðnÞ¼$ II Fin ðn!1Þ þ !P II ðnÞ ation belsused: Pls may with !P in Eq.F[14] for the Coulomb part ¼F þ !P $ II ðnÞ rence density !P difference for the nth iteration defined as ½18' ! ! ! ðnÞ ! ðnÞ ðn!1Þ ðnÞ !# ðnÞ ! ! F ¼ F þ !P $ II $ Q neglect ðmnjlsÞ; if !P Q % 10 mniteration ls ! density !P for !thelsnth ðnÞ defined as h the difference density !P for the nth iteration defined as ðnÞ ðnÞ ðn!1Þ ðnÞ ðnÞ ðn!1Þ !P ¼P P !P ¼ !P ðnÞ ¼P !PP ! ðnÞ !P ðn!1Þ hin scheme, thenumber number of of two-electron integrals needn thisthis scheme, the two-electron integrals Reduced scaling techniques: Diagonalization alternatives diag(F) 3 O(N ) E[P] = min 2 O(N ) O(N ) Scaling with system size Linear-Scaling in Quantum Chemistry Sparsity ofMethods theMethods AO matrix 36 Linear-Scaling in density Quantum Chemistry 36 AT4 Exchange-Type Contractions Number of significant elements DNA fragments (6–31G*) Figure 12 Significant elements in (a) the canonical MO coefficient matrix ðCÞ and (b) Figure 12 the Significant elements (a) theðPÞcanonical coefficient matrix ðCÞfour andDNA (b) base # one-particle densityinmatrix computedMO at the HF/6-31G level for # the one-particle density matrix ðPÞ computed at the HF/6-31G level for four $7 DNA base pairs (DNA are15000000 colored in 4 ). Significant elements with respect to a threshold$7of 10 pairs (DNA ). Significant elements with respect to a threshold of 10 are colored in 4 black. black. C P 10000000 as one-particle density matrix is of central importance for the scaling behavior one-particle is ofthe central importance the scaling willdensity becomematrix clear from discussion below.for Therefore, it is behavior crucial to as discuss will become from theofdiscussion below.density Therefore, it is crucial to[8]). discuss firstclear the behavior the one-particle matrix (Equation first the behavior the one-particle density matrix matrixðCÞ (Equation Theofcanonical MO coefficient and the [8]). one-particle density 5000000 # (P) are depicted in Figure 12 as the HF/6-31G level for Thematrix canonical MO coefficient matrix ðCÞcomputed and the at one-particle density # DNA fragment four pairs (DNA ). Here, negligible matrix matrix (P)a are depicted in with Figure 12 base as computed at 4the HF/6-31G level for ele$7 ments below 10 (DNA are plotted in white, whereas significant a DNA fragment witha threshold four baseofpairs negligible matrix ele- ele4 ). Here, $7 ments are shownof in figure clearlywhereas illustrates that basically ments below a threshold 10black. are The plotted in white, significant ele- 0 0no elements occur in the canonical MO coefficient matrix, no because ments arenonsignificant shown in black. The figure clearly illustrates that basically the canonical areintypically delocalized the entire system. This is difnonsignificant elementsMOs occur the canonical MO over coefficient matrix, because C O(N2) P O(N) 10–5 P O(N) 10–6 # signif. elements 100 200 300 Number of atoms 400 Figure 13 Scaling behavior of significant elements in the one-particle density matrix Density Matrix-Based Energy Functional sity matrix one-particle throughout. For achieving such"a reformuladensity! matrix, discussed, for a formulation of 1 SCF theories suitable for large molein a densityAsmatrix-based way, we can start by looking E ¼ tr Ph þ PGðPÞ ½109' cules, it is necessary to avoid the nonlocal MO coefficient matrix, which is dE 2 ! a slightlyconventionally different obtained viewpoint. To solvethethe problem, ¼ 0 we employ by diagonalizing FockSCF matrix. Instead alternatives dP such a reformulathe one-particle density e theDiagonalization energy functional ofmatrix throughout. For achieving where GðPÞ integral tion ofdenotes SCF theorythe in atwo-electron density matrix-based way, matrices we can startcontracted by looking with ! " at SCF theory from a slightly different viewpoint. To solve the to SCFchanges problem,conditio the density matrix. We1two minimize the energy with respect in the enforcing constraints: First, the idempotency minimize the energy functional of Ewe¼need tr toPh þ PGðPÞ ½109' one-particle density matrix, equation 2needs to be!accounted" for: 1 dE ! E ¼ tr Ph¼ þ 0 PGðPÞ 2 dP ½109' ½110' ½110' ½111' PSP ¼ P es the two-electron integral matrices contracted with where GðPÞ denotes thewith two-electron integral matrices in contracted with We minimize the energy respect to changes the enforcingthetwo constraints: First, the the idempotency condition of thein following density matrix. We minimize energy with respect to changes the and, second, the number of electrons N must be correct: matrix, density matrix, for: equationone-particle needs to be accounted dE ! ¼0 dP dE ! ¼P 0 PSP ¼ dP tr ðPSÞ ¼ N ½110' enforcing constraints: First, the N idempotency and, second, the two number of electrons must be condition correct: of the following equation needs to be accounted for: raints: First, the idempotency condition of the following PSP ¼ ¼ PN tr ðPSÞ e accounted for: and, second, the number of electrons N must be correct: PSP ¼ P tr ðPSÞ ¼ N ½111' ½111' ½112' ½112' P. The method converges quadratically toward the fully idempotent matrix. In passing by, we note thatThe the function originalx~LNV wasS designed for in Figure 16 and possesses ðxÞ ¼ approach 3x2 # 2x3 (for ¼ 1) is shown orthogonal basis functions. Nunes and Vanderbilt later a generalizatwo stationary points at fpresented ð0Þ ¼ 0 and f ð1Þ ¼ 1. The purification transforma89 tion to nonorthogonal problemstion(see well later workquadratically by White ettoal.aninidempotent density matrix (Eq.as[114]) converges TB whose eigenvalues are either was 0 or 1, which correspond to virtual or occupied Ref. 122). A modified LNV scheme for SCF theories introduced by chem 90 respectively. The necessary convergence condition is that the starting Ochsenfeld and Head-Gordon.68states, Similarly Millam and Scuseria presented are HF in the range (#0.5, 1.5). as well an extension of the LNVeigenvalues algorithmoftoP the method. In the derivation of density matrix-based SCF theory below, we do not employ the chemical potential introduced by LNV,88 but instead we ~x 2 because McWeeny’s follow the derivation of Ochsenfeld and Head-Gordon, purification automatically preserves the electron number.68 Therefore, to 1.5 avoid the diagonalization within the SCF procedure, we minimize the energy functional 1 ! " ~ðH # m 1Þ ELNV ¼ tr P McWeeny “purification” ½113& ~ denotes the purified density matrix: atrix and P ~ ¼ 3PSP # 2PSPSP ! 1 " P ~ ¼ tr P ~h þ P ~GðP ~Þ E 0.5 ½115' 2 0 −1 0.5 1 ~ is−0.5 with respect to density-matrix changes, where P the inserted purified density. 120,121 −0.5 matrix (e.g., starting The simplest approach is therefore to optimize the density ½114& 1.5 x allows one to n transformation of McWeeny idempotent density matrix P; a more idempotent matrix 120 ges quadratically toward the fully idempotent matrix. 2 3 x # 2x (for S ¼ 1) is shown in Figure 16 and possesses at f ð0Þ ¼ 0 and f ð1Þ ¼ 1. The purification transformaerges quadratically to an idempotent density matrix either 0 or 1, which correspond to virtual or occupied he necessary convergence condition is that the starting −1 ~ ¼ 3x2 # 2x3 . Figure 16 ‘‘Purification’’ function x derivation of Methods density matrix-based theory below, we do 52In the Linear-Scaling in Quantum SCF Chemistry not note employ the chemical introduced bycovariant LNV,88 but instead we the we immediately that potential this gradient is a fully tensor. Because follow the derivation Ochsenfeld and because McWeeny’s density matrix is fullyofcontravariant in Head-Gordon, ‘‘covariant integral representation,’’ it ðiÞ 68 with a trialautomatically density matrix Pgenerate ) by searching for annumber. energy minimum along purification preserves the electron Therefore, tothe is tensorially inconsistent to a new density matrix by adding the fully direction of the negative energythe gradient: ðiÞ energy avoid the diagonalization SCF procedure, we minimize covariant gradient to thewithin fully contravariant density matrix Pthe . Converting functional the covariant to contravariant indices by applying the inverse metric, we ðiÞ qE½P ( & the ½116( ! ¼ PðiÞ % s of "ðiÞenergy gradient116 as follows: find the tensorially consistentPðiþ1Þ formulation 1 ~ ~ qP ~ ~ E ¼ tr Ph þ PGðPÞ ½115' where s is the step length. The gradient2is built by forming the derivative of the Diagonalization alternatives mn ml sn ðrEÞ ¼ g ðrEÞ g ls energy functional (Eq. [115]): ½119( ~ is the%1 ml sn with respect to density-matrix changes, P inserted purified density. %1where ¼ ðS Þ ðrEÞls ðS Þ The simplest ~ approach ~ qP ~ is therefore to optimize the density matrix (e.g., starting qE qE ¼ ¼ 3FPS þ 3SPF % 2SPFPS % 2FPSPS % 2SPSPF ½117( ~ qP fully With this the qP qPcontravariant energy gradient results: E At convergence expression reduces to the usual criterion qE this%1energy gradient %1 %1 3S It FP þ 3PFS%1to% note 2PFPthat % 2S FPSP % 2PSPFS ½120( ¼ SPF ¼ ofðrEÞ FPS % ¼ 0. is important the covariant energy gradient qP (Eq. [117]) cannot be added directly to the contravariant one-particle density Computational advantage? matrix, so that a transformation with the metric is required. Because all matrices for its formation can be built in a linear-scaling Therefore, let us look briefly at the tensor properties of the energy grafashion because they gradient are sparse systems a nonvanishing dient. and Rewriting the energy (Eq. for [117]) in tensorwith notation, use optimization techniques ! to find minimum ~ qE ¼ ðrEÞmn ¼ 3Fml Pls Ssn þ 3Sml Pls Fsn qP mn % 2Sml Pls Fsa Pab Sbn % 2Fml Pls Ssa Pab Sbn ls Pab ½118( Fast Multipole Method (FMM) CFMM GvFMM Multipole expansion: (µ⌫| )= XX “interaction tensor” qkn (A)Tkn,lm (AB)qlm (B) kn lm A B multipoles Translation operator: X Vlm (B 0 ) = ⇥lm,kn (B 0 B)Tkn,lm (AB)qlm (B) kn A hierarchical (tree) code allows to calculate all Coulomb interactions at O(N) cost. (C)FMM / GvFMM L0 L1 L2 L3 log(N) levels electron integrals is an extremely expensive step as far as disk space and input/ output (I/O) time are concerned. For large molecules the required disk space (and calculation time, see discussion below) easily exceeds all available capacities. Almlo¨ f et al.19 observed in a AT seminal paper that recomputing inten DNA base pairs grals whenever needed, rather than storing them to disk, could not only be Illustrative timings 12 Conventional Fock matrix formation Diagonalization CFMM/LinK Fock matrix formation 10 CPU time [h] 8 6 4 2 0 0 2000 4000 6000 8000 Number of basis functions 10000 Reduced-scaling papers are tricky … error speed Density functional theory (DFT) > 90% of all calculations are DFT Fµ⌫ = hµ⌫ + Jµ⌫ xc Kµ⌫ + vµ⌫ Techniques discussed in sessions 1-2 apply also to DFT. References this session is based on • Introduction to Computational Chemistry, by Frank Jensen, ISBN-13: 978-0470011874. My favorite book for getting a quick start into almost any area of computational chemistry. Relevant chapters: 3-6, 9-12. • Szabo & Ostlund: Modern Quantum Chemistry, Dover Publications • Ochsenfeld, Kussmann, and Lambrecht: Linear-Scaling Methods in Quantum Chemistry; Rev. Comp. Chem. Vol. 23, Eds. K. Lipkowitz, p. 1-82 (2007) Backup Slides ferred over the MO coefficients inNote eq.elements (3.54) forand optimization, the lat ational parameters contained in an XFock matrix. that the X-variables aresince pre∂ x ia and E″ (0)) can be written in terms of matrix two-electron inteational parameters contained in an X matrix. Note that the X-variables are p erred over the MO coefficients in eq. (3.54) for optimization, since the latte not independent (the MOs must be orthonormal). The exponential may be w ferred over the MO coefficients in eq. type (3.54)wave for optimization, sinceare the given latter are 17 ( 2 grals in the MO basis. For an RHF function these in eq. ferred over the MO coefficients in eq. (3.54) for optimization, since the latter d ij must f a be Fbe fand d ab f i Fexpanded f −the + ∂ asEaMOs borthonormal). j The ⎡expansion, series energy in terms of⎤the X-variables d not independent (the must orthonormal). exponential may be wr not independent (the MOs The exponential may be written 4 ⎢ MOs must mixing =occupied–virtual (3.68).not independent 16 exponential may be writ (the be orthonormal). The ⎥ ing the of the orbitals. as a series expansion, and the energy expanded in terms of the X-variables describ- des s a series expansion, and the energy expanded in terms of the X-variables ∂ ∂ x x f f f f f f f f f f f f − − ⎣ ⎦ ia jb i b the a energy j i j 16 a b in terms i a of jthe b X-variables descr a series expansion, and expanded ingasthe occupied–virtual mixing of the orbitals. X ∂E 1 16 e = + X + XX + .16. . 1 ng the occupied–virtual mixing of the orbitals. = 4 f i F fmixing 2 ing the occupied–virtual of the orbitals. a 1 The gradient of the energy off-diagonal element1 of the molecular Fock m ∂xia e Xis= an + XX + . . . 1+ X E1 (2X ) = E (3.67) X 1 ( 0 ) + E ′( 0 )X + 2 XE ′′(0 )X + . . . X (3.68) 2e e = + X + XX + . . . 1 which is easily calculated from the atomic Fock matrix. The second deriv 1 = + X + XX + . . . 1 d f F f d f F f − + ∂ E E(X )⎡= E ij ( 0a) +2Eb′( 0 )X2ab ⎤ + 2 iXE ′′j(0)X + . . . (3.( The first = and to the X-variables 4 ⎢second derivatives of the1 energy with respect ⎥⎦.AO however,The involves two-electron integrals that require an to MO transfo E X E 0 E 0 X X E 0 X . . = + + + ′ ′′ 1 ( ) ( ) ( ) ( ) ∂ ∂ x x f f f f f f f f f f f f − − ⎣ ia jb i b a j i j a b i a j b 2 E″) (0)) be written in terms of Fock matrix elements and two-electro first and and second derivatives of the energy with respect to the X-variables (E′(0) E (X 0 E 0 X X E 0 X . . . = Ecan + + + ′′( ) ( ) ′( ) Direct minimization (DM) 2 17 tion (see Section 4.2.1), and is therefore computationally expensive. and E″ (0)) can be written in terms of Fock matrix elements and two-electron grals in the MO basis. For an RHF type wave function theseinteare given The first and second derivatives of the energy with respect to the X-variables (E′ The gradient of the energy is an off-diagonal element of the molecular Fock matrix, 17 TheIn first and second derivatives energy with respect to the X-variables ( grals inE″ the MO basis. For of aninthe RHF type wave function these are given in eq.matri (3.68). awhich density matrix formulation, the energy depends on the density and (0)) can be written terms of Fock matrix elements and two-electron in is easily calculated from the atomic Fock matrix. The second derivative, 17 nd E″however, (0)) caninvolves be written in terms matrix grals in the MO basis. For anFock RHF type wave function these arecontracti given in ments as(3.68). variables, and can formally be written aselements the trace of two-electron the two-electron integrals that require an AO to and MO transforma∂Eof = 4 fi F fa 17 (3.68). ∂ E rals density intion the(see MO basis. For an RHF type wavehfunction these are given i Section 4.2.1), andone-electron is therefore computationally the matrix with the matrix andexpensive. the two-electron matr = 4 f i F f∂axia a density matrix the ⎡energy depends on the density matrix ele∂xformulation, ia implicitly 3.68). d D. with theInlatter depending ij f a F f b − d ab f i F f j + ⎤(3.68) ∂E ∂ 2 E =on 4 4 2 =ij 4f afFi fFbfbe ments as variables,∂ and written trace ⎤of the contraction of a− ⎢d ab fi F ffj as +− the ⎥ E can⎡dformally ∂ ∂ x x f f f f f f f f f f f − ⎣ i b a j ia jb i j a b i a j b ⎦ ∂=x4ia⎢one-electron ⎥ the density matrix with the matrix h and the two-electron matrix G, ( ∂E∂xiaE = trace Dh + trace DG D ) ( ) ( ( ) ) ∂x(jbD f f f f f f f f f f f f − − ⎣ ⎦ (3.m i b a j i j a b i a j b 2f F 4 f = ELECTRONIC STRUCTURE METHODS: INDEPENDENT-PARTICLE MODELS The gradient of the energy is an off-diagonal element of the molecular Fock d f F f d f F f − + i a ∂ E a D. b ab i j ⎡ ij on ⎤ with the latter depending implicitly 4 ⎢off-diagonal ∂which xiathe The gradient of energy is=an element offreely, theFock molecular Fock matrix, is easily calculated from the atomic matrix. The second deri ⎥ density matrix elements cannot be varied however, as the or ∂xEia(∂D x jb) = trace f f f f f f f f f f f f − − ⎣ ⎦ i( Dh b )a+ trace j i j a b i a j b (3.69) ( (DGmatrix. (D)) The second derivative, which is easily calculated from the atomic Fock The 2 however, involves two-electron integrals that require an AO to MO trans d f F f d f F f − + ∂ E optimization step, but the non-idempotent density matrix derived from taking ij a b ab i j ⎡ ⎤ must remain orthonormal, and this constraint can be formulated as the de however, involves two-electron integrals that require an AO to MO transformaThe gradient of the energy is an off-diagonal element of the molecular Fock mat tion (see Section 4.2.1), and is therefore computationally expensive. 4⎢ =elements 18 The density matrix cannot be varied freely, however, as the orbitals ⎥ optimization step can be “purified” by the McWeeny procedure. matrix having to be idempotent, DSD = D. It is difficult to ensure this duri tion (see and isftherefore computationally expensive. ∂xSection ∂easily x In f f f f f f f f f f f − − a density matrix formulation, the energy depends on the density matr ⎣ ⎦ ia jb 4.2.1), which is calculated from the atomic Fock matrix. The second derivati i b a j i j a b i a j b must In remain orthonormal, and this constraint can be formulated as the density a density matrix formulation, energy depends on theas density matrix elements as variables, andthe can formally written the trace of the contrac 2 3be require however, involves two-electron integrals that an AO to MO transform (3. Dformally =be3=D 2isD matrix having to be idempotent, DSD D. −Itelement ensure this during purified ments variables, andiscan written asdifficult thematrix trace of the contraction of an ma The gradient ofasthe energy an off-diagonal oftothe molecular Fock m the density matrix with the one-electron h and the two-electron tion (see Section 4.2.1), and is therefore computationally expensive. the density matrix with the depending one-electron matrixFock hon and the two-electron matrix G,deriva with the latter implicitly D. which is easily calculated from the atomic matrix. The second In a density matrix formulation, energy depends on the density matrix eo The idempotency condition ensures thattheeach orbital is occupied by exactly with the latter depending implicitly on D. ments as two-electron variables, and can formally bethis written as the trace contraction however, integrals that require an AO tothe MO transfo E (D) = trace (Dh ) + trace (DG (Dallow )) of electron.involves E. Cancès has shown that relaxing condition to fractional oc A B C
© Copyright 2025 ExpyDoc