Parallelisation over bands and over plane waves are not the only possible ways that ab initio
electronic structure codes can exploit modern HPC resources. Many such codes, but not VASP, also
exploit parallelisation over k-points, examples being CASTEP [15],[16] and CRYSTAL [17],[18]. k-points
are ultimately due to the translational symmetry of the systems being studied [19]. This symmetry also
results in many, but not all, operations at a given k-point being independent from those at another k-point.
This naturally allows another level of parallelism, and it has been shown that exploitation of
k-point parallelism can greatly increase the scalability, a recent example being [20]. More generally
this use of hierarchical parallelism is one of the more common methods of scaling to very large
numbers of cores [21].
The standard release of VASP does not, however, exploit this possibility. Therefore we have
modified the code from VASP 5.2.2 to add this extra level of parallelisation. The code is organised
so that the cores may be split into a number of groups, and each of these groups performs
calculations on a subset of the k-points. The number of such groups is specified by the new KPAR
tag, which is set in the INCAR input file. Thus if the run uses 10 k-points and KPAR is set to 2 there will
be 2 k-point groups each performing calculations on 5 k-points. Similarly if KPAR is set to 5 there
will be 5 groups each with 2 k-points. It can therefore be seen that KPAR has an analogous role to
NPAR mentioned above, except that it applies to k-point parallelism. Currently the value of KPAR
is limited to values that divide exactly both the total number of k-points and the total number of
cores used by the job. It should be noted that NPAR is also subject to the latter restriction.
This introduction of another level of parallelism through use of the k-points does potentially greatly
increase the scalability of the code. However it should not be viewed as a panacea. Not all
operations involve k-points, and thus Amdahl's law [22] effects will place a limit on the scalability
that can be achieved. Further some quantities are not perfectly parallel across k-points, evaluation
of the Fermi level being an obvious example. And it should be noted that introduction of k-point
parallelism does introduce some extra communication and synchronization.
However probably the biggest limitation on the use of k-point parallelism is that due to system size.
As the size of the unit cell that the calculation uses is increased fewer k-points are required to
converge the calculation to a given precision, and in the limit only a single k-point may be sufficient
to accurately represent the system. This limits what can be achieved by k-point parallelism, but for
many practising computational scientists the calculations they require are not so large that they can
be converged with 1 k-point, some recent examples being [23]-[26]. Therefore for many cases
parallelisation over k-points is a useful technique. This is especially true when the total energy
calculation is only one part of a larger calculation, for instance in a geometry optimisation.
Asimina Maniopoulou 2011-07-09