For many years ab inito electronic structure calculations have been one of the main stays of high
performance computing (HPC). Whilst the methods used for those calculations have changed, for
many years now the method introduced by Car and Parinello in 1985 [1] has been one of the most
common to be employed. This is based upon density functional theory [2]; the Kohn-Sham [3]
equations are solved within a plane wave basis set by minimisation of the total energy functional,
with the use of pseudopotentials [4]-[5] to obviate the representation of core states. A review of the
method can be found in [6]. Such is the importance of these methods that over 30% of all the cycles
used on the phase2b component of HECToR [7], the UK's high-end computing resource, in the period
from December 2010
August 2011 were for packages performing total energy pseudopotential
calculations.
One of the best known and widely used packages for performing this type of calculation is VASP [8], [9], [10], [11], the Vienna Ab initio Simulation Package. Indeed on HECToR it is the most extensively used
package of all, and thus maximising its performance is vital for researchers using this, and related,
machines. In this report I will describe my recent work on improving the parallel
scalability of the code for certain classes of common problems. I have achieved this by
introducing a new level of parallelism based upon the use of k-point sampling within VASP. Whilst
this is common in similar codes, the latest release of VASP when I started off the project, version 5.2.2, does not support it, and
I will show that through its use the scalability of calculations on small to mid-sized systems can
be markedly improved. This is a particularly important class of problems as often the total energy
calculation is not the only operation to be performed in the calculation. An important example is
geometry optimisation. Here very many total energy calculations may need to be performed one
after another. Thus the total size of the system under study is limited by time constraints, and so
parallel scaling of the calculation on such moderate sized systems must be good if many cores are to
be exploited efficiently.
Asimina Maniopoulou 2011-07-09