For many years ab inito electronic structure calculations have been one of the main stays of high performance computing (HPC). Whilst the methods used for those calculations have changed, for many years now the method introduced by Car and Parinello in 1985 [1] has been one of the most common to be employed. This is based upon density functional theory [2]; the Kohn-Sham [3] equations are solved within a plane wave basis set by minimisation of the total energy functional, with the use of pseudopotentials [4]-[5] to obviate the representation of core states. A review of the method can be found in [6]. Such is the importance of these methods that over 30% of all the cycles used on the phase2b component of HECToR [7], the UK's high-end computing resource, in the period from December 2010 $-$ August 2011 were for packages performing total energy pseudopotential calculations.
One of the best known and widely used packages for performing this type of calculation is VASP [8], [9], [10], [11], the Vienna Ab initio Simulation Package. Indeed on HECToR it is the most extensively used package of all, and thus maximising its performance is vital for researchers using this, and related, machines. In this report I will describe my recent work on improving the parallel scalability of the code for certain classes of common problems. I have achieved this by introducing a new level of parallelism based upon the use of k-point sampling within VASP. Whilst this is common in similar codes, the latest release of VASP when I started off the project, version 5.2.2, does not support it, and I will show that through its use the scalability of calculations on small to mid-sized systems can be markedly improved. This is a particularly important class of problems as often the total energy calculation is not the only operation to be performed in the calculation. An important example is geometry optimisation. Here very many total energy calculations may need to be performed one after another. Thus the total size of the system under study is limited by time constraints, and so parallel scaling of the calculation on such moderate sized systems must be good if many cores are to be exploited efficiently.

Asimina Maniopoulou 2011-07-09