The Parallelisation of VASP

Details of how VASP is parallelised is covered in some detail elsewhere [12], and I shall only cover the details which are relevant to the work here.
VASP 5.2.2 offers parallelisation (and data distribution) over bands and over plane wave coefficients, and both may be used together. How this division occurs is controlled by the NPAR tag in the INCAR file. In particular if there are a total of NPROC cores in the job, each band will be distributed over NPROC/NPAR cores. Thus, if NPAR=1 all the bands will be distributed over all processors, while if NPAR=NPROC the coefficients for a given band are all associated with one core.
NPAR reflects a tensioning between conflicting requirements. If bands are distributed across all processors, the communication costs for the parallel three dimensional Fast Fourier Transforms (FFTs) required by the plane wave pseudopotential method are high, but the cost for linear algebra operations required by the method, such as orthogonalisation and diagonalisation, are relatively low. On the other hand if NPAR=1 the communication cost for the FFTs is non-existent, but it is high for the linear algebra operations. Thus NPAR must be chosen carefully to obtain the best performance possible, and it will depend upon the chemical system being studied, the hardware upon which the run is being performed and the number of cores being used (amongst other possibilities).
Thus the use of NPAR allows a run to either stress the parallel FFT or parallel diagonalisation. Unfortunately neither of these operations scale well on distributed memory parallel architectures [13]-[14], especially for the moderate size grids and matrices used in many VASP runs.

Asimina Maniopoulou 2011-07-09