Session 2: Solving the SCF equations

Session 2:
Solving the SCF equations
Daniel Lambrecht
Department of Chemistry
University of Pittsburgh
Quantum Chemistry Workshop
May 7th 2014
Pittsburgh, PA
Self-Consistent Field (SCF) Procedure
Calculate F(C)
FC = eSC
from Atkins/Friedman
Choices for the “initial guess”
core Hamiltonian = “zero guess”
Superposition of Atomic Densities (SAD)
Extended Hückel Theory (EHT)
…
Solving the generalized eigenvalue problem
FC = SC✏
S
S
1
2
1
2
1
2
FC = S
1
2
FS
0
SC✏
1
2
1
2
S C = S C✏
0
0
FC =C✏
diag(F’)
oscillations. A given density matrix Dn gives a Fock matrix Fn, which, upon diagonalization, gives a density matrix Dn+1. The Fock matrix Fn+1 from Dn+1 gives a
density matrix Dn+2 that is close to Dn, but Dn and Dn+1 are very different, as illustrated in Figure 3.5. The damping procedure tries to solve this by replacing the
current density matrix with a weighted average, D′n+1 = wDn + (1 − w)Dn+1. The
weighting factor w may be chosen as a constant or changed dynamically during
the SCF procedure.
Converging the HF solutions
Density
D 0 F0
D 2 F2
D4 F 4
Converged
value
Iterations
D 1 F1
1.
2.
3.
4.
5.
D 3 F3
D 5 F5
Figure 3.5 An oscillating SCF procedure
Extrapolation
(3) Level shifting. This technique is perhaps best understood in the formulation of a
rotation of the MOs that form the basis for the Fock operator (Section 3.6). At
Damping
convergence, the Fock matrix elements in the MO basis between occupied and
virtual orbitals are zero. The iterative procedure involves mixing (making linear
Level
shifting
combinations of) occupied and virtual MOs. During the iterative procedure, these
mixings may be large, causing oscillations or making the total energy increase. The
Direct
inversion in the iterative subspace (DIIS)
degree of mixing may be reduced by artificially increasing the energy of the virtual
orbitals. If a sufficiently large constant is added to the virtual orbital energies, it
Direct
minimization
can be shown
that the total energy(DM)
is guaranteed to decrease, thereby forcing con13
vergence. The more the virtual orbitals are raised in energy, the more stable is the
∑
( ) ci = 1(E n +1= n +1c) E
erged solution has an error of zero, and the DIIS method forms a linear comn +1 iteri i
hifts,
convergence
is
guaranteed,
but
it
is
likely
to
occur
very
slowly,
and
density
matrices
(F
,
F
,
F
,
.
.
.
and
D
,
D
,
D
,
.
.
.)
are
produced.
At
each
0
1
2
0
1
2
n ground
andindicators
maywe
in that,
some
cases
converge
to
state that
is not
the
state.
ion of the error
in ato
least
squares
sense,
is aaminimum
(as close
try
find
the
point
with
lowest
error,
which
iis
=0 not necessari
i
=
0
ation,
it
is
also
assumed
that
an
estimate
of
the
“error”
(E
,
E
,
E
,
.
.
.)
is
available,
0 E1
me
cases
to
a state
that
is not the
ground
state.
=2
ci Eni was developed by
ro as
possible).
In the
function
space
generated
by
the
previous
iterations
n +1procedure
(4)converge
Direct
inversion
in
the
iterative
subspace
(DIIS).
This
points
actually
calculated.
It
is
common
to
use
the
trace
(sum
of
diago
i.e.
how
far
the
current
Fock/density
matrix
is
from
the
converged
solution.
The
3.8
SCF
TECHNIQUES
i
=
0
14
ry
to
find
the
point
with
lowest
error,
which
is
not
necessarily
one
of
the
n in the iterative
subspace
(DIIS).
This procedure
was developed
by
1 efficient in co
cinormalization
=very
P.
Pulay
and
is
an
extrapolation
procedure.
It
has
proved
to
be
Minimization
of
the
ErrF
subject
to
the
converged
solution
has
an
error
of
zero,
and
the
DIIS
method
forms
a
linear
comof
the matrix
product
the
error
matrix
with
itself as a scalar in
n
tsan
actually
calculated. Itprocedure.
is common
to14use
the
trace
(sum of
of
diagonal
elements)
i =0
extrapolation
It
has
proved
to
be
very
efficient
in
forcing
convergence
and itself
in
reducing
the
number
iterations
the
same
time,
bination
of
the
error
indicators
that,
in aa least
squares
sense,
isof
a minimum
(asat
close
e matrix product
of the
error
matrix
with
as
scalar
indicator
of
the
1
c
=
Lagrange
method
(Section
12.5),
and
leads
to
the
i
n followi
error.
gence
and
in
reducing
the
number
of
iterations
at
the
same
time,
to
zero
as now
possible).
space generated
previous
iterations
and
it is
one In
of the
thefunction
most commonly
used
for
helping
SCF
conver.
ithe
=0 ErrF
Minimization
ofbymethods
the
subject
to the
normalization
*
F
=
ci Fi
n normalizat
where
l
is
the
multiplier
associated
with
the
ne of the we
most
commonly
used
methods
for
helping
SCF
convertry
to
find
the
point
with
lowest
error,
which
is
not
necessarily
one
of
the
gence. The
idea
is as
follows.
AsLagrange
the iterative
procedure
runs,
a sequence
of
Fock
method
(Section
12.5),
and
leads
to
the
follo
i
=
0
ErrF
c
=
trace
E
⋅
E
(
)
(
)
ErrF
c
=
trace
E
⋅
E
(
)
(
)
n
+
1
n
+
1
n
+
1
n
+
1
Minimization
the
ErrF
to the normalization
constraint i
pointsAs
actually
calculated.
It is commonruns,
toofuse
the
tracesubject
(sumof
of Fock
diagonal
elements)
a is as follows.
the matrices
iterative
procedure
a
sequence
and density
(F
,
F
,
F
,
.
.
.
and
D
,
D
,
D
,
.
.
.)
are
produced.
At
each
iter0
1
2 where l is0 the
1 multiplier
2
n
associated
with
the
normaliz
n
of
the
matrix
product
of
the
error
matrix
with
itself
as
a
scalar
indicator
of
the
method
12.5),
leads
to the
following
set
of
l
a11the
aeach
L
a
−
1
c
0
The(Section
only
remaining
question
is
the
nature
of
the
trices (F0, ation,
F1, F2,it. .is. Ealso
and
, iD
are
produced.
At
iterEiLagrange
12 and
1
n
1
0c
1, Dthat
2, . . .)
⎛
⎞
⎛
⎞
⎛
⎞
n +1 = D
assumed
an
estimate
of
“error”
(E
,
E
,
E
,
.
.
.)
is
available,
∑
0
1
2
error.
En +associated
(3.64)ci E iwith the normalization.
i =0
1=
where l is the multiplier
∑
∑
∑
Direct inversion in the iterative subspace
(DIIS)
∑
∑
∑
a12 L
−1⎞ ⎛ cThe
assumed that
an estimate
the “error”
(E0, E
is from
available,
the
difference
FDS
SDF
(S isathe
overlap
11 − converged
1solution.
n
1 ⎞ matr
⎛ athe
⎛ ⎟0
1, E
2, . . .)is
i.e. how
farn the of
current
Fock/density
matrix
⎜
⎟
⎜
⎟
⎜
i =0 L
a
a
a
−
1
c
0
21
22
2
n
2
ErrF
c
=
trace
E
⋅
E
(
)
(
)
n
+
1
n
+
1
e current Fock/density
is from
the of
converged
solution.
ent
the
SCF
energy
withforms
respect
to
⎜ aThe
⎟⎞⎟⎜com⎟ coe
⎜ ⎟0
ci matrix
=1
L
a⎞2a⎟n linear
−01the
cMO
converged∑solution
has
an error
zero,of
and
the
DIIS
method
21 a1na22 −1
2 ⎜
a
a
L
c
⎜
⎜
n
11
12
1
⎛
⎞
⎛
⎛
n
i =0
⎜com⎟ ⎜close
⎟ M⎜
tion has an
error
of
zero,
and
the
DIIS
method
forms
a
linear
=
to
work
well
in
practice.
A
closely
related
method
M
M
O
M
M
M
bination of the error indicators
that,
in
a
least
squares
sense,
is
a
minimum
(as
En +1 = ∑ ci Ei
⎜
⎜ a21 ci a=221 L
⎟ ⎜ c 2 ⎟M⎟ ⎜15
⎜ 0M ⎟⎟⎟⎜ M ⎟⎜ = ⎜ ⎟
M a2n M −1O
⎜
mization
of the
subject
normalization
constraint
isminimum
handled
by(as
the
(3.64)
error indicators
intoa the
least
squares
sense,
is aspace
close
i =0
cator,
and
has
the
acronym
to ErrF
zerothat,
as possible).
In the function
generated
by the previous
⎜
⎟EDIIS.
⎜ ⎟⎟ ⎜⎜iterations
⎟⎟⎜ ⎟⎜ ⎜ ⎟
i
=
0
⎜
⎜
⎟
aL
L
a1⎟nn= ⎜c−Mn1⎟ cn 0 0
ange method (Section 12.5), and leads to then following set of linear
equations,
a
a
−
M
M
O
M
M
n 1 M ana
2nn
n
1
n
2
⎜
⎟
⎜
sible).
In the
function
space
generated
by
the
previous
iterations
(5)
“Direct
minimization”
techniques.
we try associated
to find the
with
lowest error,
which ⎜is not necessarily⎟ The
one ⎟variationa
of
⎜ the
⎟⎜ ⎜ ⎟
1
c
=
⎜
⎜
⎟
e l is the multiplier
with point
the normalization.
i
∑
⎝ −normalization
⎠⎟⎠⎝ − l ⎠⎝ is⎝ ⎠h
⎟ ⎜function
⎟1⎠constraint
⎜elements)
1
−
1
L
−
0
−
Minimization
of
ErrF
subject
the
ato
L
a
−
1
c
the pointpoints
with lowest
error,
which
isis common
not
necessarily
one
of
the
⎝⎜⎜−ause
⎝
n1
1 the
n−
2 trace
nn
n
i =0Itthe
to minimize
the
energy
as
a
of
the
MO
actually
calculated.
to
(sum
of
diagonal
1
L
−
1
0
l
1
−
−
⎟⎜ ⎟ ⎜ ⎟
a
a
L
a
−
1
c
0
11
12
1
n
1
⎛
⎞
⎛
⎞
⎛
⎞
calculated.Minimization
is common
to ErrF
usemethod
the
(sum
of
diagonal
elements)
⎝ −1matrix
⎠given
⎝following
⎝ −eq.
⎠of
Lagrange
12.5),
and
leads
the
set
of lin
aijas
trace
E
E
=−handled
⋅indicator
ofItthe
matrix
product
oftrace
theto(Section
error
matrix
with
itself
a1toscalar
the
(
density
elements,
as
(3.54).
In
−
1
L
0
l
1
i−
j⎠) by
of
the
subject
the
normalization
constraint
is
by
the
⎜ a21 a22 L a2n −1⎟ ⎜ c 2 ⎟ ⎜ 0 ⎟
)
product ofLagrange
the
matrix
with
itself
a no
scalar
of(ofE
the
ij = trace
i ⋅ E normalization.
jequations,
error.
where
l is
the
with
the
method
(Section
and
toassociated
theaaindicator
following
set
linear
different
from
other
types
of non-linear opti
⎜ error
⎟ ⎜12.5),
⎟multiplier
⎜ as
⎟leads
Ac
=
b
ij = trace(E i ⋅ E j )
=
M
M
O
M
M
M
M
⎜ l is the multiplier ⎟associated
⎜ ⎟ ⎜ with
⎟
where
the normalization.
technique,
such as steepest
descent, conjugate
−1
⎜ an 1 an 2 L ann −1⎟ ⎜ cn ⎟ErrF
⎜ 0 (⎟ c ) = trace
Ac
=
b
⋅ Ena
(En=+L
=1A
b1 ⎞ ⎛ 0 ⎞
Ac
1b
+11)n c −
a
a
c
11
12
⎛
⎞
⎛
⎜
⎟
⎜
⎟
⎜
⎟
be used (see Chapter 12 for details
a11 ⎠⋅ E
a12 ⎠ )L⎝ a⎠1n −methods
1⎞n ⎛ c1 ⎞ ⎛ can
0 ⎞ (3.65)
⎛
⎝ −1 (c−
⎝
ErrF
E
) 1= trace
(
−
1
L −1 n +01 −nl+1 −1
−b1
c
=
A
In
iteration
n
the
matrix
(n + 1) ×
⎜
⎟ ⎜ has
⎟ dimension
⎜ 0the
⎟ variational
c
=
A
b
⎜ a21 a22 L Ea2n a=21
⎟As
⎜ cca222
⎟mentioned
⎜L
⎟ aA
in
Section
3.6,
−
1
c
−
1
0
2
n
2
E
n
n +1 SCF∑
aij = trace(Ei ⋅ E j ) ⎜
3.8
103
⎟TECHNIQUES
⎜ i ⎟ i20.
⎜ The
⎟
⎜
⎟
⎜
⎟
⎜
⎟
less
than
coefficients
c
can
be
obtained
b
(3.64)
terms
of
an
exponential
transformation
of
the
M
i
=
0
=
M
M
O
M
M
M
M
In
iteration
n
the
A
matrix
has
dimension
(n
+
1)
×
(n
+
1),
wh
E
E
=
c
i ⎜i
⎟ ⎜ M⎟ ⎜O⎟
= ⎜ M (n
M
M
M
M
Ac =nb+1 ∑In
⎜
⎟
⎜
⎟
⎟ by
iteration
n
the
A
matrix
has
dimension
+
1)
×
(n
n matrix
and
multiplying
it
onto
the
b
vector,
i.e.
in
the
“si
(3.64)
⎜
⎟
⎜
⎟
⎜
⎟
i
=
0
ational
parameters
contained
in
an
X
matrix.
N
less
than
20.
The
coefficients
c
can
be
obtained
directly
a
a
L
a
−
1
c
0
n n
−1
n1
n2
nn
c n= A b
⎜cFai *=n 1=1linear
⎟
⎜
⎟
⎜
⎟
⎜
⎟ ⎜ an⎟2 equations
⎜L⎟ ann are
(3.65)
−
1
c
0
the
solved
by
“direct
inversi
∑
n
ci⎠Fover
(3.66)
less⎝ −1than
20.
The
coefficients
c
can
be
obtained
by
⎠∑
⎝−
⎝ −onto
⎠ thethe
matrix
and
multiplying
b
vector,
i.e.
in
the
“subspace”
of
MO
coefficients
in
eq.
(3.54)
for
n ferred
i it
−1 L
−
1
0
l
1
⎜
⎟
⎜
⎟
⎜
⎟
i
=
0
eration n the A matrix
+ 1), where
n usually
is coefficients that minimize the e
1 dimension (n + 1) × (nHaving
ci =has
i =0 obtained
the
∑
⎝
⎠“direct
⎝vector,
⎠ inversion”,
⎝−
⎠orthonormal)
the
linear
equations
are
solved
by
thus
t
not
independent
(the
MOs
must
be
−
1
−
1
L
−
1
0
l
1
−
a
trace
E
E
=
⋅
(
)
matrix
and
multiplying
it
onto
the
b
i.e.
in
the
“sub
ij
i
j
than 20. The coefficients
c
can
be
obtained
by
directly
inverting
the
A
i =0
the
same
set
of
coefficients
isthe
used
for error
generating
a
Minimization
the Having
ErrF
subject
toas
the
normalization
constraint
isenergy
handled
by functio
the in
obtained
the
coefficients
that
minimize
the
Theit only
remaining
question
is
the
nature
of
the
error
function.
Pulay
suggested
a
series
expansion,
and
expanded
ix and multiplying
onto the
bof
vector,
i.e.
in
the
“subspace”
of
the
“iterations”
Aclinear
=b
the
equations
are
solved
by
“direct
inversion
a
trace
E
E
=
⋅
(
)
ij
i
j
(F*)
at
iteration
n,
which
is
used
in
place
Fn for16
Lagrange
method
(Section
12.5),
and
leads
to
the
following
set
of
linear
equations,
flinear
the ErrF
subject
to
the
normalization
constraint
is
handled
by
the
the
difference
FDS
−
SDF
(S
is
the
overlap
matrix),
which
is
related
to
the
gradiequations are solved by “direct
inversion”,
thus
the
DIIS.
the−1same
set of
is used for generating
an of
extrapolat
ingcoefficients
the name
occupied–virtual
mixing of the
orbitals.
∑
Computational cost & reduced-scaling approaches
Computer time
Space
Introduction
Introduction
Scaling vs. prefactor
(a)(a)
(b)(b)
5) 5
O(M
O(M
)
3) 3
O(M
)
O(M
O(M)
O(M)
Molecule
size
(M)
Molecule
size
(M)
Computation time
Computation time
Computation time
Computation time
2 2
2.7
M
2.7M
2 2
MM
2.7
MM
2.7
MM
Molecule
Moleculesize
size(M)
(M)
Direct SCF: Motivation
Conventional: Store integrals to disk
#ERIs =
N4
N4/8
(permutation symmetry)
100 BFs
800 MB
100 MB
1000 BFs
7.5 TB
0.9 TB
Prohibitive for large systems
Direct SCF (“D-SCF”)
Idea: Recompute integrals as needed
Advantages:
Circumvent storage bottleneck
What else?
Computational cost of SCF
Computationally dominant steps:
Fock build
J, K
Diagonalization or Energy minimization
diag(F)
E[P] = min
decay exponentially with distance. Therefore, the number of basis functions wn
decay exponentially with distance. Therefore, the number of basis functions wn
overlapping
with
asymptotically(for
(forlarge
large
molecules)
overlapping
withthe
thefunction
function wwmm will
will asymptotically
molecules)
remain
constant
with
size(in
(ina away
wayone
one
imagine
remain
constant
withincreasing
increasing molecular
molecular size
cancan
imagine
a a
‘‘sphere’’
around
asshown
shownininFigure
Figure
Overall
‘‘sphere’’
aroundthe
theselected
selected basis
basis function
function as
3).3).
Overall
there
areare
OðMÞ
basis-function
eachofofthe
thetwo
two
electrons,
there
OðMÞ
basis-function pairs
pairs describing
describing each
electrons,
so so
2 2
a total
OðM
two-electron integrals
integrals results:
thatthat
a total
of of
OðM
Þ Þtwo-electron
results:
Basis function products
ðð
11
ðr
Þw
ðr
Þ
w
1
1
m 1 Þwnnðr1 Þ
ðr
wm|fflfflfflfflfflfflfflffl
ffl{zfflfflfflfflfflfflfflfflffl}ffl} rr12
12
|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl
OðMÞ
OðMÞ
wl ðr
2 Þw
1 dr
2 2
w
Þwssðrðr2ffl}2Þ Þdrdr
1 dr
l ðr2ffl{zfflfflfflfflfflfflfflffl
|fflfflfflfflfflfflfflffl
|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}
OðMÞ
½11$½11$
OðMÞ
Figure 3 Illustration of the basis functions-pair domain behavior. For a given basis
function (or shell) m, only those n’s must be considered whose overlap integrals
Direct SCF: Coulomb
(J)µ⌫ =
X
P
(µ⌫|
)
DO m = 1, N
O(N
)
DO n = 1, N
DO l = 1, N
O(N
)
DO s = 1, N
Jmn += Psl * gmnls
ENDDO s
ENDDO l
ENDDO n
ENDDO m
µ!⌫
!
Cost:
2
O(N )
Integral
storage:
O(1)
More tricks possible … O(N )
Schwarz Integral Estimates
2
Although
the
asymptotic
OðM
Þ scaling of the four-center two-electron
Integral screening
20
integrals had been known at least since 1973, it was only in 1989 in the
21
seminal
work of Ha¨ ser Methods
and Ahlrichs
that an efficient
and widely accepted
10
Linear-Scaling
in Quantum
Chemistry
way of rigorously preselecting the numerically significant two-electron integrals was introduced:
it not only avoids the bottleneck of storing the huge number of four-center
1
1
two-electron integrals,
but$also
it becomes
possible
the two-electron
2 % jðlsjlsÞj
2 ¼to
jðmnjlsÞj
jðmnjmnÞj
Qscreen
½12(
mn Qls
integrals in combination with the corresponding one-particle density matrix P
available
in each
iteration.
For screening
example,
when
calculating
theupper
Coulomb
part
Schwarz
inequality
This
so-called
Schwarz
integral
provides
a rigorous
bound
to
(Jmnfour-index
), integrals integrals,
are neglected
if their
contributions
to the Coulomb
matrixtwoare
the
while
requiring
just the computation
of simple
!#
below
a
selected
threshold
of
10
:
index quantities. In this way, small four-index
integrals below a chosen threshold can be neglected and the formal M4 scaling associated with the formation
!#
2
neglect
ðmnjlsÞ;
if
jP
j
$
Q
Q
%
10
½14'
ls to OðM
mn lsÞ. As we will see shortly,
of the Fock matrix in HF theory is reduced
this was a breakthrough for direct SCF methods19,21,22 and increased the
An analogous screening criterion may be formulated for the exchange matrix
applicability of SCF methods dramatically.
Kmn .
In contrast to computing the two-electron integrals before an SCF calcuIt has to be pointed out that in nondirect SCF, the density matrix is not
lation, the key feature of the direct SCF method is the recalculation of twoavailable for screening, because all integrals are calculated prior to the SCF
electron integrals in each SCF iteration. Although this recalculation has the
run. Equation [12] has to be employed for screening instead of Eq. [14].
Direct SCF: Exchange
(K)µ⌫ =
P
X
P
(µ | ⌫)
µ!
= e ⌘r for insulators
n
= 1/r
for metals
r!1
O(N )
DO m = 1, N
DO n = 1, N
O(1)
DO l = 1, N
O(1)
DO s = 1, N
Kmn += Psl * gmsln
ENDDO s
ENDDO l
ENDDO n
ENDDO m
O(N )
!
!⌫
Reduced scaling techniques: Fock build
Formal
h
J
K
diag(F)
O(N 2 )
4
O(N )
4
O(N )
3
O(N )
AO
P
CFMM
sparsity sparsity GvFMM
O(N )
O(N )
2
O(N )
O(N )
O(N 2 )
O(N )
O(N )
3
3
3
O(N )
O(N 2 )
O(N )
2
O(N )
O(N )
O(N )
Scaling with system size
Reduced scaling techniques: Fock build
RI / Cholesky
2
h
J
K
O(N )
diag(F)
O(N 3 )
4
O(N )
O(N 4 )
O(N 2 )
3
O(N )
O(N 3 )
Scaling with basis set size
ðnÞ
ðnÞ$ II
¼
h
þ
P
F
are
given
by
F ¼coupling
h þ P with
$ IIthe density matrix.
discarded due to the missing
Awith
further
improvement
ontwo-electron
integral screening
can beofachieved
by em
II as the
antisymmetrized
integrals. Instead
constructing
ðn!1Þ
ðn!1Þ
ðnÞiteration,
ðnÞ
¼
h
þ
P
$
II
F
the full Fock
matrix
in each
a
recursive
scheme
asmatrices
in the following
difference
densities
(c.f.
Ref.
19
and
21).
The
Fock
of iterat
F
¼
h
þ
P
$
II
Incremental
Fock
builds
equation
may
be
used:
n ! 1 are given by ðn!1Þ
ðn!1Þ
F
¼ hðn!1Þ
þP
$ II
antisymmetrized two-electron
integrals.
Instead
F ¼F
þ !P $ II
½16' of
ðnÞ
ðnÞ
F ðnÞ ¼ h þ P $ II
matrix
in eachdensity
iteration,
recursive
scheme
as
in
with
the difference
!P
for theanth
iteration
defined
as
the
antisymmetrized
two-electron
integrals.
Instead
of
con
ðn!1Þ
ðn!1Þ
¼hþP
$ II
F
yock
be matrix
used: in each iteration,
a
recursive
scheme
as
in
the
ðnÞ
ðnÞ
ðn!1Þ
½17'
!P ¼ P ! P
ðnÞ
ðnÞ
hmay
II asbe
theused:
antisymmetrized two-electron integrals. Instead of constr
Within this scheme, the number of two-electron integrals needed for the
ðnÞ
ðn!1Þ a recursive
ðnÞ scheme as in the foll
fullFock
Fock
matrix
in
each
ðnÞ iteration,
matrix updates
each iteration
may be$ screened
by replacing
F !PðnÞ¼$ II Fin ðn!1Þ
þ !P
II
ðnÞ
ation
belsused:
Pls may
with !P
in Eq.F[14] for
the
Coulomb
part
¼F
þ !P $ II
ðnÞ
rence density !P
difference
for the nth iteration defined
as
½18'
!
!
! ðnÞ !
ðnÞ
ðn!1Þ
ðnÞ
!#
ðnÞ
!
!
F
¼
F
þ
!P
$
II
$
Q
neglect
ðmnjlsÞ;
if
!P
Q
%
10
mniteration
ls
!
density !P for !thelsnth
ðnÞ
defined as
h the difference density !P
for
the
nth
iteration
defined
as
ðnÞ
ðnÞ
ðn!1Þ
ðnÞ
ðnÞ
ðn!1Þ
!P
¼P
P
!P ¼
!P
ðnÞ
¼P
!PP
!
ðnÞ
!P
ðn!1Þ
hin
scheme,
thenumber
number of
of two-electron
integrals
needn
thisthis
scheme,
the
two-electron
integrals
Reduced scaling techniques:
Diagonalization alternatives
diag(F)
3
O(N )
E[P] = min
2
O(N )
O(N )
Scaling with system size
Linear-Scaling
in
Quantum
Chemistry
Sparsity
ofMethods
theMethods
AO
matrix
36
Linear-Scaling
in density
Quantum
Chemistry
36
AT4
Exchange-Type Contractions
Number of
significant elements
DNA fragments (6–31G*)
Figure 12 Significant elements in (a) the canonical MO coefficient matrix ðCÞ and (b)
Figure 12 the
Significant
elements
(a) theðPÞcanonical
coefficient
matrix
ðCÞfour
andDNA
(b) base
#
one-particle
densityinmatrix
computedMO
at the
HF/6-31G
level for
#
the one-particle
density
matrix ðPÞ computed at the HF/6-31G level for four $7
DNA
base
pairs (DNA
are15000000
colored in
4 ). Significant elements with respect to a threshold$7of 10
pairs (DNA
).
Significant
elements
with
respect
to
a
threshold
of
10
are
colored
in
4
black.
black.
C
P
10000000 as
one-particle density matrix is of central importance for the scaling behavior
one-particle
is ofthe
central
importance
the scaling
willdensity
becomematrix
clear from
discussion
below.for
Therefore,
it is behavior
crucial to as
discuss
will become
from theofdiscussion
below.density
Therefore,
it is
crucial to[8]).
discuss
firstclear
the behavior
the one-particle
matrix
(Equation
first the behavior
the one-particle
density matrix
matrixðCÞ
(Equation
Theofcanonical
MO coefficient
and the [8]).
one-particle
density
5000000
#
(P) are
depicted
in Figure
12 as
the HF/6-31G
level for
Thematrix
canonical
MO
coefficient
matrix
ðCÞcomputed
and the at
one-particle
density
#
DNA
fragment
four
pairs (DNA
). Here,
negligible
matrix
matrix (P)a are
depicted
in with
Figure
12 base
as computed
at 4the
HF/6-31G
level
for ele$7
ments below
10 (DNA
are plotted
in white,
whereas
significant
a DNA fragment
witha threshold
four baseofpairs
negligible
matrix
ele- ele4 ). Here,
$7
ments
are shownof in
figure
clearlywhereas
illustrates
that basically
ments below
a threshold
10black.
are The
plotted
in white,
significant
ele- 0 0no
elements
occur
in the
canonical
MO coefficient
matrix, no
because
ments arenonsignificant
shown in black.
The
figure
clearly
illustrates
that basically
the canonical
areintypically
delocalized
the entire
system.
This is difnonsignificant
elementsMOs
occur
the canonical
MO over
coefficient
matrix,
because
C O(N2)
P O(N) 10–5
P O(N) 10–6
# signif. elements
100
200
300
Number of atoms
400
Figure 13 Scaling behavior of significant elements in the one-particle density matrix
Density Matrix-Based
Energy Functional
sity matrix one-particle
throughout.
For
achieving
such"a reformuladensity! matrix,
discussed, for a formulation
of 1
SCF theories
suitable
for large molein a densityAsmatrix-based
way,
we
can
start
by
looking
E
¼
tr
Ph
þ
PGðPÞ
½109'
cules, it is necessary to avoid the nonlocal
MO coefficient
matrix, which is
dE
2
!
a slightlyconventionally
different obtained
viewpoint.
To solvethethe
problem,
¼ 0 we employ
by diagonalizing
FockSCF
matrix.
Instead
alternatives
dP such a reformulathe one-particle
density
e theDiagonalization
energy
functional
ofmatrix throughout. For achieving
where GðPÞ
integral
tion ofdenotes
SCF theorythe
in atwo-electron
density matrix-based
way, matrices
we can startcontracted
by looking with
!
"
at
SCF
theory
from
a
slightly
different
viewpoint.
To
solve
the to
SCFchanges
problem,conditio
the density matrix.
We1two
minimize
the energy
with
respect
in the
enforcing
constraints:
First,
the
idempotency
minimize
the energy functional of
Ewe¼need
tr toPh
þ
PGðPÞ
½109'
one-particle
density
matrix,
equation 2needs to be!accounted" for:
1
dE
!
E ¼ tr Ph¼
þ 0 PGðPÞ
2
dP
½109'
½110'
½110'
½111'
PSP ¼ P
es the two-electron integral matrices contracted with
where GðPÞ
denotes
thewith
two-electron
integral
matrices in
contracted
with
We
minimize
the
energy
respect
to
changes
the
enforcingthetwo
constraints:
First,
the the
idempotency
condition
of thein following
density
matrix.
We
minimize
energy
with
respect
to
changes
the
and,
second,
the
number
of
electrons
N
must
be
correct:
matrix,
density
matrix, for:
equationone-particle
needs to be
accounted
dE !
¼0
dP
dE !
¼P
0
PSP
¼
dP
tr ðPSÞ ¼
N
½110'
enforcing
constraints:
First, the N
idempotency
and, second,
the two
number
of electrons
must be condition
correct: of the following
equation needs to be accounted for:
raints: First, the idempotency condition of the following
PSP ¼
¼ PN
tr ðPSÞ
e accounted for:
and, second, the number of electrons N must be correct:
PSP ¼ P
tr ðPSÞ ¼ N
½111'
½111'
½112'
½112'
P. The method converges quadratically toward the fully idempotent matrix.
In passing by, we note thatThe
the function
originalx~LNV
wasS designed
for in Figure 16 and possesses
ðxÞ ¼ approach
3x2 # 2x3 (for
¼ 1) is shown
orthogonal basis functions. Nunes
and
Vanderbilt
later
a generalizatwo
stationary
points
at fpresented
ð0Þ ¼ 0 and
f ð1Þ ¼ 1. The purification transforma89
tion to nonorthogonal
problemstion(see
well later
workquadratically
by White ettoal.aninidempotent density matrix
(Eq.as[114])
converges
TB
whose eigenvalues
are either was
0 or 1,
which correspond
to virtual or occupied
Ref. 122). A modified LNV scheme
for SCF theories
introduced
by
chem
90
respectively.
The
necessary
convergence
condition is that the starting
Ochsenfeld and Head-Gordon.68states,
Similarly
Millam
and
Scuseria
presented
are HF
in the
range (#0.5, 1.5).
as well an extension of the LNVeigenvalues
algorithmoftoP the
method.
In the derivation of density matrix-based SCF theory below, we do
not employ the chemical potential introduced by LNV,88 but instead we
~x 2 because McWeeny’s
follow the derivation of Ochsenfeld and Head-Gordon,
purification automatically preserves the electron number.68 Therefore, to
1.5
avoid the diagonalization within the SCF procedure, we minimize the energy
functional
1
!
"
~ðH # m
1Þ
ELNV ¼ tr P
McWeeny “purification”
½113&
~ denotes the purified density matrix:
atrix and P
~ ¼ 3PSP # 2PSPSP ! 1 "
P
~ ¼ tr P
~h þ P
~GðP
~Þ
E
0.5
½115'
2
0
−1
0.5
1
~ is−0.5
with respect to density-matrix changes, where
P
the
inserted
purified
density.
120,121
−0.5 matrix (e.g., starting
The simplest approach is therefore to optimize the density
½114&
1.5
x
allows one to
n transformation of McWeeny
idempotent density matrix P; a more idempotent matrix
120
ges quadratically toward the fully idempotent matrix.
2
3
x # 2x (for S ¼ 1) is shown in Figure 16 and possesses
at f ð0Þ ¼ 0 and f ð1Þ ¼ 1. The purification transformaerges quadratically to an idempotent density matrix
either 0 or 1, which correspond to virtual or occupied
he necessary convergence condition is that the starting
−1
~ ¼ 3x2 # 2x3 .
Figure 16 ‘‘Purification’’ function x
derivation of Methods
density matrix-based
theory below, we do
52In the
Linear-Scaling
in Quantum SCF
Chemistry
not note
employ
the chemical
introduced
bycovariant
LNV,88 but
instead
we the
we
immediately
that potential
this gradient
is a fully
tensor.
Because
follow the
derivation
Ochsenfeld and
because
McWeeny’s
density
matrix
is fullyofcontravariant
in Head-Gordon,
‘‘covariant integral
representation,’’
it
ðiÞ
68
with a trialautomatically
density matrix
Pgenerate
) by searching
for annumber.
energy
minimum
along
purification
preserves
the
electron
Therefore,
tothe
is
tensorially
inconsistent
to
a new
density
matrix
by
adding
the
fully
direction
of the negative energythe
gradient:
ðiÞ energy
avoid
the diagonalization
SCF procedure,
we minimize
covariant
gradient to thewithin
fully contravariant
density
matrix Pthe
. Converting
functional
the
covariant to contravariant indices by applying
the inverse metric, we
ðiÞ
qE½P
(
& the
½116(
! ¼ PðiÞ % s of
"ðiÞenergy gradient116 as follows:
find the tensorially consistentPðiþ1Þ
formulation
1 ~ ~ qP
~
~
E ¼ tr Ph þ PGðPÞ
½115'
where s is the step length. The gradient2is built by forming the derivative of the
Diagonalization alternatives
mn
ml
sn
ðrEÞ
¼
g
ðrEÞ
g
ls
energy functional (Eq. [115]):
½119(
~ is the%1
ml
sn
with respect to density-matrix changes,
P
inserted
purified
density.
%1where
¼ ðS Þ ðrEÞls ðS Þ
The simplest
~ approach
~ qP
~ is therefore to optimize the density matrix (e.g., starting
qE
qE
¼
¼ 3FPS þ 3SPF % 2SPFPS % 2FPSPS % 2SPSPF
½117(
~
qP fully
With this the
qP qPcontravariant energy gradient results:
E
At convergence
expression reduces to the usual criterion
qE this%1energy gradient
%1
%1
3S It FP
þ 3PFS%1to% note
2PFPthat
% 2S
FPSP % 2PSPFS
½120(
¼ SPF ¼
ofðrEÞ
FPS %
¼ 0.
is important
the covariant
energy gradient
qP
(Eq. [117]) cannot be added directly to the contravariant one-particle density
Computational
advantage?
matrix, so that a transformation
with the metric is
required.
Because all matrices for its formation can be built in a linear-scaling
Therefore, let us look briefly at the tensor properties of the energy grafashion
because
they gradient
are sparse
systems
a nonvanishing
dient. and
Rewriting
the energy
(Eq. for
[117])
in tensorwith
notation,
use optimization techniques
!
to find minimum
~
qE
¼ ðrEÞmn ¼ 3Fml Pls Ssn þ 3Sml Pls Fsn
qP
mn
% 2Sml Pls Fsa Pab Sbn
% 2Fml Pls Ssa Pab Sbn
ls
Pab
½118(
Fast Multipole Method (FMM)
CFMM
GvFMM
Multipole expansion:
(µ⌫|
)=
XX
“interaction tensor”
qkn (A)Tkn,lm (AB)qlm (B)
kn lm
A
B
multipoles
Translation operator:
X
Vlm (B 0 ) =
⇥lm,kn (B 0 B)Tkn,lm (AB)qlm (B)
kn
A hierarchical (tree) code allows to calculate all
Coulomb interactions at O(N) cost.
(C)FMM / GvFMM
L0
L1
L2
L3
log(N)
levels
electron integrals is an extremely expensive step as far as disk space and input/
output (I/O) time are concerned. For large molecules the required disk space (and
calculation time, see discussion below) easily exceeds all available capacities.
Almlo¨ f et al.19 observed in a AT
seminal
paper that recomputing inten DNA base pairs
grals whenever needed, rather than storing them to disk, could not only be
Illustrative timings
12
Conventional Fock matrix formation
Diagonalization
CFMM/LinK Fock matrix formation
10
CPU time [h]
8
6
4
2
0
0
2000
4000
6000
8000
Number of basis functions
10000
Reduced-scaling papers are tricky …
error
speed
Density functional theory (DFT)
> 90% of all calculations are DFT
Fµ⌫ = hµ⌫ + Jµ⌫
xc
Kµ⌫ + vµ⌫
Techniques discussed in sessions 1-2 apply also to DFT.
References this session is based on
• Introduction to Computational Chemistry, by Frank Jensen, ISBN-13:
978-0470011874. My favorite book for getting a quick start into almost any
area of computational chemistry. Relevant chapters: 3-6, 9-12.
• Szabo & Ostlund: Modern Quantum Chemistry, Dover Publications
• Ochsenfeld, Kussmann, and Lambrecht: Linear-Scaling Methods in Quantum
Chemistry; Rev. Comp. Chem. Vol. 23, Eds. K. Lipkowitz, p. 1-82 (2007)
Backup Slides
ferred
over
the
MO
coefficients
inNote
eq.elements
(3.54)
forand
optimization,
the lat
ational
parameters
contained
in an
XFock
matrix.
that the
X-variables
aresince
pre∂
x
ia
and
E″
(0))
can
be
written
in
terms
of
matrix
two-electron
inteational
parameters
contained
in an
X matrix.
Note that the X-variables
are p
erred over
the
MO
coefficients
in
eq.
(3.54)
for
optimization,
since
the
latte
not
independent
(the
MOs
must
be
orthonormal).
The
exponential
may
be w
ferred
over
the
MO coefficients
in eq. type
(3.54)wave
for optimization,
sinceare
the given
latter are
17
(
2
grals
in
the
MO
basis.
For
an
RHF
function
these
in
eq.
ferred
over
the
MO
coefficients
in
eq.
(3.54)
for
optimization,
since
the
latter
d ij must
f a be
Fbe
fand
d ab
f i Fexpanded
f
−the
+
∂ asEaMOs
borthonormal).
j The
⎡expansion,
series
energy
in terms
of⎤the
X-variables
d
not independent
(the
must
orthonormal).
exponential
may
be
wr
not independent
(the MOs
The
exponential
may
be written
4 ⎢ MOs must mixing
=occupied–virtual
(3.68).not independent
16 exponential may be writ
(the
be orthonormal).
The
⎥
ing
the
of
the
orbitals.
as
a
series
expansion,
and
the
energy
expanded
in
terms
of
the
X-variables
describ- des
s a series expansion,
and
the
energy
expanded
in
terms
of
the
X-variables
∂
∂
x
x
f
f
f
f
f
f
f
f
f
f
f
f
−
−
⎣
⎦
ia
jb
i b the
a energy
j
i j 16 a b in terms
i a of jthe
b X-variables descr
a series
expansion,
and
expanded
ingasthe
occupied–virtual
mixing
of
the
orbitals.
X
∂E
1 16
e
=
+
X
+
XX + .16. .
1
ng the occupied–virtual
mixing
of
the
orbitals.
= 4 f i F fmixing
2
ing
the
occupied–virtual
of
the
orbitals.
a
1
The gradient of the energy
off-diagonal
element1 of the molecular Fock m
∂xia e Xis= an
+
XX
+
.
.
.
1+ X
E1 (2X ) = E
(3.67)
X
1 ( 0 ) + E ′( 0 )X + 2 XE ′′(0 )X + . . .
X
(3.68)
2e
e
=
+
X
+
XX
+
.
.
.
1
which is easily calculated
from
the
atomic
Fock
matrix.
The
second
deriv
1
=
+
X
+
XX
+
.
.
.
1
d
f
F
f
d
f
F
f
−
+
∂ E E(X )⎡= E
ij ( 0a) +2Eb′( 0 )X2ab
⎤
+ 2 iXE ′′j(0)X + . . .
(3.(
The first =
and
to
the
X-variables
4 ⎢second derivatives of the1 energy with respect
⎥⎦.AO
however,The
involves
two-electron
integrals
that
require
an
to
MO
transfo
E
X
E
0
E
0
X
X
E
0
X
.
.
=
+
+
+
′
′′
1
(
)
(
)
(
)
(
)
∂
∂
x
x
f
f
f
f
f
f
f
f
f
f
f
f
−
−
⎣
ia
jb
i
b
a
j
i
j
a
b
i
a
j
b
2
E″) (0))
be
written
in
terms
of
Fock
matrix
elements
and
two-electro
first and and
second
derivatives
of
the
energy
with
respect
to
the
X-variables
(E′(0)
E (X
0
E
0
X
X
E
0
X
.
.
.
= Ecan
+
+
+
′′( )
( ) ′( )
Direct minimization (DM)
2
17
tion (see
Section
4.2.1),
and
is
therefore
computationally
expensive.
and
E″
(0))
can
be
written
in
terms
of
Fock
matrix
elements
and
two-electron
grals
in
the
MO
basis.
For
an
RHF
type
wave
function
theseinteare given
The
first
and
second
derivatives
of
the
energy
with
respect
to
the
X-variables
(E′
The gradient of the energy
is
an
off-diagonal
element
of
the
molecular
Fock
matrix,
17
TheIn
first
and
second
derivatives
energy
with
respect
to
the
X-variables
(
grals
inE″
the
MO
basis.
For of
aninthe
RHF
type
wave
function
these
are
given
in eq.matri
(3.68).
awhich
density
matrix
formulation,
the
energy
depends
on
the
density
and
(0))
can
be
written
terms
of
Fock
matrix
elements
and
two-electron
in
is easily calculated from the atomic Fock matrix. The second derivative,
17
nd E″however,
(0))
caninvolves
be
written
in terms
matrix
grals
in the
MO
basis.
For
anFock
RHF
type
wave
function
these
arecontracti
given in
ments
as(3.68).
variables,
and
can
formally
be
written
aselements
the
trace
of two-electron
the
two-electron
integrals
that
require
an
AO
to and
MO
transforma∂Eof
= 4 fi F fa
17
(3.68).
∂
E
rals density
intion
the(see
MO
basis.
For
an
RHF type
wavehfunction
these are given
i
Section
4.2.1),
andone-electron
is therefore
computationally
the
matrix
with
the
matrix
andexpensive.
the two-electron
matr
= 4 f i F f∂axia
a density
matrix
the ⎡energy
depends on the density matrix
ele∂xformulation,
ia implicitly
3.68).
d D.
with theInlatter
depending
ij f a F f b − d ab f i F f j +
⎤(3.68)
∂E ∂ 2 E =on
4
4
2
=ij 4f afFi fFbfbe
ments as variables,∂ and
written
trace ⎤of the contraction
of
a− ⎢d ab
fi F
ffj as
+− the
⎥
E can⎡dformally
∂
∂
x
x
f
f
f
f
f
f
f
f
f
f
f
−
⎣ i b a j
ia
jb
i j
a b
i a
j b ⎦
∂=x4ia⎢one-electron
⎥
the density matrix
with
the
matrix
h
and
the
two-electron
matrix G, (
∂E∂xiaE
=
trace
Dh
+
trace
DG
D
)
(
)
(
(
)
)
∂x(jbD
f
f
f
f
f
f
f
f
f
f
f
f
−
−
⎣
⎦
(3.m
i b
a j
i j
a b
i a
j b
2f F
4
f
=
ELECTRONIC
STRUCTURE
METHODS:
INDEPENDENT-PARTICLE
MODELS
The
gradient
of
the
energy
is
an
off-diagonal
element
of
the
molecular
Fock
d
f
F
f
d
f
F
f
−
+
i
a
∂
E
a D.
b
ab i
j
⎡ ij on
⎤
with the latter depending implicitly
4 ⎢off-diagonal
∂which
xiathe
The gradient
of
energy
is=an
element
offreely,
theFock
molecular
Fock
matrix,
is
easily
calculated
from
the
atomic
matrix.
The
second
deri
⎥
density
matrix
elements
cannot
be
varied
however,
as
the
or
∂xEia(∂D
x jb) = trace
f
f
f
f
f
f
f
f
f
f
f
f
−
−
⎣
⎦
i( Dh
b )a+ trace
j
i
j
a
b
i
a
j
b
(3.69) (
(DGmatrix.
(D)) The second derivative,
which is easily calculated from the atomic Fock
The
2 however, involves
two-electron
integrals
that require an AO to MO trans
d
f
F
f
d
f
F
f
−
+
∂
E
optimization
step,
but
the
non-idempotent
density
matrix
derived
from
taking
ij
a
b
ab
i
j
⎡
⎤
must remain
orthonormal,
and
this
constraint
can
be
formulated
as
the
de
however,
involves
two-electron
integrals
that
require
an
AO
to
MO
transformaThe
gradient
of
the
energy
is
an
off-diagonal
element
of
the
molecular
Fock
mat
tion
(see
Section
4.2.1),
and
is
therefore
computationally
expensive.
4⎢
=elements
18
The
density
matrix
cannot
be
varied
freely,
however,
as
the orbitals
⎥
optimization
step
can
be
“purified”
by
the
McWeeny
procedure.
matrix having
to
be
idempotent,
DSD
=
D.
It
is
difficult
to
ensure
this
duri
tion
(see
and
isftherefore
computationally
expensive.
∂xSection
∂easily
x In
f
f
f
f
f
f
f
f
f
f
f
−
−
a density
matrix
formulation,
the
energy
depends
on
the density
matr
⎣
⎦
ia
jb 4.2.1),
which
is
calculated
from
the
atomic
Fock
matrix.
The
second
derivati
i
b
a
j
i
j
a
b
i
a
j
b
must In
remain
orthonormal,
and
this
constraint
can
be
formulated
as
the
density
a density
matrix
formulation,
energy
depends
on theas
density
matrix
elements
as variables,
andthe
can
formally
written
the
trace
of the
contrac
2
3be require
however,
involves
two-electron
integrals
that
an
AO
to
MO
transform
(3.
Dformally
=be3=D
2isD
matrix
having
to
be idempotent,
DSD
D. −Itelement
ensure
this during
purified
ments
variables,
andiscan
written
asdifficult
thematrix
trace
of
the
contraction
of an ma
The gradient
ofasthe
energy
an
off-diagonal
oftothe
molecular
Fock
m
the
density
matrix
with the
one-electron
h and
the two-electron
tion (see Section 4.2.1), and is therefore computationally expensive.
the
density
matrix
with
the depending
one-electron
matrixFock
hon
and
the two-electron
matrix G,deriva
with
the
latter
implicitly
D.
which
is
easily
calculated
from
the
atomic
matrix.
The
second
In a density
matrix formulation,
energy
depends
on the density
matrix eo
The idempotency
condition
ensures thattheeach
orbital
is occupied
by exactly
with the latter depending implicitly on D.
ments
as two-electron
variables,
and can
formally
bethis
written
as the
trace
contraction
however,
integrals
that
require
an
AO
tothe
MO
transfo
E (D) = trace
(Dh
) + trace
(DG
(Dallow
)) of
electron.involves
E.
Cancès
has shown
that
relaxing
condition
to
fractional
oc
A
B
C