During later testing, it was discovered that a large proportion of
both the memory and computational time of Castep calculations were
spent in the subroutine ion_beta_beta_recip. The reason
for both the memory and time cost of this operation is that this
subroutine has to construct a modified overlap matrix between the
so-called non-local projectors, often referred to as the
-projectors. These projectors are arrays of plane-wave
coefficients, just like a wavefunction, but because they are
independent of the bands they are only distributed by plane-wave and
k-point. The precise operation this subroutine performs is the
construction of the matrix
:
![]() |
(4.1) |
![]() |
(4.2) |
Our solution to this problem was to distribute the -projectors
over the band-group. In the first phase the local
-projectors
are constructed, and a call to ZHERK computes the purely
local contribution to the
matrix. The second phase requires a
computation of the local projectors with the projectors on the other
nodes via a call to ZGEMM. Rather than get the relevant data
via comms, we instead redefined the
-projectors at this point
so that they now contained the full effect of the diagonal matrix
,
i.e.
. This allows us to compute the
second contribution to the
by simply computing the overlap between
the local node's
-projectors and the non-distributed
-projectors.
Work completed, tested and working.