We have examined several test cases with the new code. Here we present five:

In Tests 1-3 the PBE exchange correlation functional is used and the k-mesh is generated by the Monkhorst-Pack method. Tests 4-5 involve Hartee-Fock calculations and the k-mesh is generated with the Gamma centered method. All runs except for the phonon calculation are a single point energy calculation.

All runs have been performed on the phase 2b component of the HECToR system, the UK's national supercomputing service. This is a large Cray XE6 system. The nodes are based upon AMD Magny-Cours processors, and contain 24 cores each clocking at 2.1GHz. There is 32 Gbytes of memory associated with each node, and inter-node communication is via Cray's Gemini network. More details may be found at the HECToR web site ([7]).

In Tables ([*]), ([*]), ([*]), ([*]) and ([*]) we compare the performance of VASP 5.2.2 with the new k-point parallelized code and study the scaling of the new code. In each case the original code is compared with the k-point code with increasing numbers of k-point groups. All times reported are total run times, i.e. not just the time for the energy minimisation.
In Tables ([*]), ([*]), ([*]) and ([*]) we compare the performance of VASP 5.2.2 using the optimal NPAR value with the performance of the k-point parallelized code, when the same number of cores is utilized. We demonstrate that efficient use of large number of cores is now possible for cases with more than one k-point.
It should be noted that the use of an appropriate NPAR value is imperative for the efficient running of VASP. The optimal value of NPAR in the original code depends on the total number of cores employed. For the k-point parallelized code, the optimal value of NPAR depends on the number of cores in one k-group. Hence the value of NPAR that was optimal for the original code on $x$ cores, will be also the most efficient choice for $n$ k-groups on $n \times x$ cores, when using the k-points parallelized code.

Asimina Maniopoulou 2011-07-09