Test Case 5

The last test case is a phonon system with 20 k-points. This is a test case where NPAR has to be equal to the total number of cores in the original code. This, according to the discussion in [*] means that the linear algerbra operations cannot be parallelized and only the FFT calculations are performed in parallel. For the k-parallelized code accordingly, NPAR should be equal to the number of cores in one k-group.

Table: Scaling of Test Case 5 (where Speedup is taken to be 1 for 32 cores).
Test Case 5 Cores Time (secs) Speedup
VASP 5.2.2 32 399.929 1
KPAR=2 64 221.94 1.802
KPAR=4 128 112.407 3.558

Firstly, Table ([*]) shows that the original code does not scale at all over 32 cores. The FFT communications cost becomes the bottleneck and it is not possible to perform the computation in less than 400 secs with the original code.
Table ([*]) on the other hand shows that with the k-points parallelized code we can employ 4 times more cores and we complete the simulation in 122 secs (3.6 speedup). The problem though is that we cannot use more than 128 cores in this case, where potentially we could use 640 (number of k-points (20) $\times$ 32) for this case. This is because during the specific calculation new k-point meshes are generated. When KPAR is an exact divisor of the number of the k-points in the new mesh our k-points parallelized code performs the calculation efficiently. When not, it exits. In this case the original k-mesh had 20 k-point, the second k-mesh 52 and the third 68. Only the numbers 2 and 4 are common divisors of the aforementioned 3 numbers. Hence the biggest value that can be used for KPAR is 4.

Table: As seen above the original code does not scale at all.
Test Case 5 Cores Time (secs) Speedup
VASP 5.2.2 64 603.294 1
KPAR=2 64 221.94 2.718
VASP 5.2.2 128 2477.339 1
KPAR=4 128 112.407 22.039
VASP 5.2.2 160 4651.663 1
KPAR=5 160 64.96 71.608

Figure 5: Speedup of Test Case 5 (where Speedup is taken to be 1 for 32 cores.)

Asimina Maniopoulou 2011-07-09