VASP Benchmark Results
We have run a number of VASP use cases and investigated the parallel scaling and effect of VASP runtime parameters on performance on HECToR. This page summarises the results to allow users to try and select the optimal configuration for running their VASP calculations.
The full set of benchmarking results are also available:
Runtime parameters
Our benchmarks look at the variation of VASP performance as a function of number of MPI tasks and also with the variation of the following VASP runtime parameters which can be set by the user in the INCAR file:
- NPAR - Changes the balance between parallelisation over bands and over plane waves. For exact-exchange calculations this parameter is fixed at the number of MPI tasks.
- NSIM - Changes the number of bands treated simultaneously (effectively uses either matrix-vector or matrix-matrix multiplications).
- LPLANE - Changes the parallel decomposition of the 3D FFT. .TRUE. leads to a slab decomposition where two dimensions of the FFT are on the same MPI task (reduces collective communications but can introduce load imbalance or limit parallelism). .FALSE. leads to a pencil decomposition (more collective communications but more potential for parallelism and load-balancing).
We also investigated the effect up underpopulating HECToR compute nodes and using one core per Bulldozer module. This has the effect of increasing the memory and interconnect bandwidth available to a single MPI task and also giving each MPI task exclusive access to the double-width floating-point unit on the processor.
Benchmarks
We used the following benchmarks:
- TiO2 Supercell - 750 atoms, pure DFT, Γ-point, 6 electronic minimisation steps.
- LiZnO - 64 atoms, pure DFT, Γ-point, single-point energy.
- LiZnO - 64 atoms, exact-exchange, Γ-point, single-point energy.
Results
TiO2 Supercell
Nodes | MPI Tasks | MPI Tasks per Node | MPI Tasks per Die | Stride | NPAR | LPLANE | NSIM | Time / s | Scaling | |
---|---|---|---|---|---|---|---|---|---|---|
4 | 128 | 32 | 8 | 1 | 8 | .TRUE. | 8 | 1827.8 | 1.0 | |
8 | 256 | 32 | 8 | 1 | 16 | .TRUE. | 1 | 1050.6 | ||
16 | 512 | 32 | 8 | 1 | 32 | .TRUE. | 8 | 662.4 | ||
32 | 512 | 16 | 4 | 2 | 32 | .FALSE. | 1 | 465.5 |
LiZnO Exact Exchange
Nodes | MPI Tasks | MPI Tasks per Node | MPI Tasks per Die | Stride | NPAR | LPLANE | NSIM | Time / s | Scaling | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 32 | 32 | 8 | 1 | 32 | .TRUE. | 8 | 1357.1 | 1.0 | |
2 | 64 | 32 | 8 | 1 | 64 | .TRUE. | 1 | 1010.8 | ||
4 | 128 | 32 | 8 | 1 | 128 | .TRUE. | 1 | 967.8 | ||
8 | 128 | 16 | 4 | 2 | 256 | .TRUE. | 1 | 612.3 |
Fe FCC Supercell
Nodes | MPI Tasks | MPI Tasks per Node | MPI Tasks per Die | Stride | NPAR | LPLANE | NSIM | Time / s | Scaling | |
---|---|---|---|---|---|---|---|---|---|---|
4 | 128 | 32 | 8 | 1 | 16 | .TRUE. | 16 | 1308.8 | 1.0 | |
8 | 128 | 16 | 4 | 2 | 16 | .TRUE. | 16 | 857.1 | ||
16 | 256 | 16 | 4 | 2 | 16 | .TRUE. | 8 | 622.9 |