As has already been noted in section 2.1, the number of k-points required for a Castep calculation decreases as the system size increases, so that for most HPC Castep calculations only O(1) k-points are required. If the simulation system is large enough, it can be described well by sampling only at the so-called -point, k=(0,0,0). This is important because at the -point the eigenstates can be chosen to be explicitly real, rather than complex. Not only does this halve the storage requirements for the wavefunctions, it also doubles the speed of many of the operations on the wavefunctions, including the FFTs.
Over the course of the two CDG workshops this optimisation has been implemented in the Castep 4.4 codebase, and has already been tested and shown to be working on HECToR. Switching on the -point optimisations, a Castep groundstate calculation on a 1230 atom polypeptide, running on 512 cores of HECToR, went from a computational time of 27,523s to 14,261s, a speed-up of over 1.9.
In principle the orthogonalisation and subspace rotations can be quadrupled in speed, since all the complex-complex operations become real-real. However in reciprocal-space the wavefunctions are still complex, and it is only their dot-products that are real-this is not straightforward to exploit using standard BLAS calls and has not yet been implemented fully in Castep.