Introducing k-point Parallelism

Parallelisation over bands and over plane waves are not the only possible ways that ab initio electronic structure codes can exploit modern HPC resources. Many such codes, but not VASP, also exploit parallelisation over k-points, examples being CASTEP [15],[16] and CRYSTAL [17],[18]. k-points are ultimately due to the translational symmetry of the systems being studied [19]. This symmetry also results in many, but not all, operations at a given k-point being independent from those at another k-point. This naturally allows another level of parallelism, and it has been shown that exploitation of k-point parallelism can greatly increase the scalability, a recent example being [20]. More generally this use of hierarchical parallelism is one of the more common methods of scaling to very large numbers of cores [21].
The standard release of VASP does not, however, exploit this possibility. Therefore we have modified the code from VASP 5.2.2 to add this extra level of parallelisation. The code is organised so that the cores may be split into a number of groups, and each of these groups performs calculations on a subset of the k-points. The number of such groups is specified by the new KPAR tag, which is set in the INCAR input file. Thus if the run uses 10 k-points and KPAR is set to 2 there will be 2 k-point groups each performing calculations on 5 k-points. Similarly if KPAR is set to 5 there will be 5 groups each with 2 k-points. It can therefore be seen that KPAR has an analogous role to NPAR mentioned above, except that it applies to k-point parallelism. Currently the value of KPAR is limited to values that divide exactly both the total number of k-points and the total number of cores used by the job. It should be noted that NPAR is also subject to the latter restriction.
This introduction of another level of parallelism through use of the k-points does potentially greatly increase the scalability of the code. However it should not be viewed as a panacea. Not all operations involve k-points, and thus Amdahl's law [22] effects will place a limit on the scalability that can be achieved. Further some quantities are not perfectly parallel across k-points, evaluation of the Fermi level being an obvious example. And it should be noted that introduction of k-point parallelism does introduce some extra communication and synchronization.
However probably the biggest limitation on the use of k-point parallelism is that due to system size. As the size of the unit cell that the calculation uses is increased fewer k-points are required to converge the calculation to a given precision, and in the limit only a single k-point may be sufficient to accurately represent the system. This limits what can be achieved by k-point parallelism, but for many practising computational scientists the calculations they require are not so large that they can be converged with 1 k-point, some recent examples being [23]-[26]. Therefore for many cases parallelisation over k-points is a useful technique. This is especially true when the total energy calculation is only one part of a larger calculation, for instance in a geometry optimisation.
Asimina Maniopoulou 2011-07-09