The Parallelisation of VASP

Details of how VASP is parallelised is covered in some detail elsewhere [12], and I shall only cover
the details which are relevant to the work here.

VASP 5.2.2 offers parallelisation (and data distribution) over bands and over plane wave
coefficients, and both may be used together. How this division occurs is controlled by the NPAR tag
in the INCAR file. In particular if there are a total of NPROC cores in the job, each band will be
distributed over NPROC/NPAR cores. Thus, if NPAR=1 all the bands will be distributed over all
processors, while if NPAR=NPROC the coefficients for a given band are all associated with one
core.

NPAR reflects a tensioning between conflicting requirements. If bands are distributed across all
processors, the communication costs for the parallel three dimensional Fast Fourier Transforms
(FFTs) required by the plane wave pseudopotential method are high, but the cost for linear algebra
operations required by the method, such as orthogonalisation and diagonalisation, are relatively low.
On the other hand if NPAR=1 the communication cost for the FFTs is non-existent, but it is high for
the linear algebra operations. Thus NPAR must be chosen carefully to obtain the best performance
possible, and it will depend upon the chemical system being studied, the hardware upon which the
run is being performed and the number of cores being used (amongst other possibilities).

Thus the use of NPAR allows a run to either stress the parallel FFT or parallel diagonalisation.
Unfortunately neither of these operations scale well on distributed memory parallel architectures [13]-[14], especially for the moderate size grids and matrices used in many VASP runs.

Asimina Maniopoulou 2011-07-09