The HECToR Service is now closed and has been superceded by ARCHER.

VASP Benchmark Results

We have run a number of VASP use cases and investigated the parallel scaling and effect of VASP runtime parameters on performance on HECToR. This page summarises the results to allow users to try and select the optimal configuration for running their VASP calculations.

The full set of benchmarking results are also available:

Runtime parameters

Our benchmarks look at the variation of VASP performance as a function of number of MPI tasks and also with the variation of the following VASP runtime parameters which can be set by the user in the INCAR file:

  • NPAR - Changes the balance between parallelisation over bands and over plane waves. For exact-exchange calculations this parameter is fixed at the number of MPI tasks.
  • NSIM - Changes the number of bands treated simultaneously (effectively uses either matrix-vector or matrix-matrix multiplications).
  • LPLANE - Changes the parallel decomposition of the 3D FFT. .TRUE. leads to a slab decomposition where two dimensions of the FFT are on the same MPI task (reduces collective communications but can introduce load imbalance or limit parallelism). .FALSE. leads to a pencil decomposition (more collective communications but more potential for parallelism and load-balancing).

We also investigated the effect up underpopulating HECToR compute nodes and using one core per Bulldozer module. This has the effect of increasing the memory and interconnect bandwidth available to a single MPI task and also giving each MPI task exclusive access to the double-width floating-point unit on the processor.

Benchmarks

We used the following benchmarks:

  • TiO2 Supercell - 750 atoms, pure DFT, Γ-point, 6 electronic minimisation steps.
  • LiZnO - 64 atoms, pure DFT, Γ-point, single-point energy.
  • LiZnO - 64 atoms, exact-exchange, Γ-point, single-point energy.

Results

TiO2 Supercell

NodesMPI TasksMPI Tasks per NodeMPI Tasks per DieStrideNPARLPLANENSIMTime / sScaling
412832818.TRUE.81827.81.0
8256328116.TRUE.11050.6
16512328132.TRUE.8662.4
32512164232.FALSE.1465.5

LiZnO Exact Exchange

NodesMPI TasksMPI Tasks per NodeMPI Tasks per DieStrideNPARLPLANENSIMTime / sScaling
132328132.TRUE.81357.11.0
264328164.TRUE.11010.8
41283281128.TRUE.1967.8
81281642256.TRUE.1612.3

Fe FCC Supercell

NodesMPI TasksMPI Tasks per NodeMPI Tasks per DieStrideNPARLPLANENSIMTime / sScaling
4128328116.TRUE.161308.81.0
8128164216.TRUE.16857.1
16256164216.TRUE.8622.9