To demonstrate the benefits of these changes to the user, the full PW.X executable was compiled with the original Alltoallv implementation, and the new padded Alltoall and SHM alltoallv modifications. This was then used to run the first two stages of the GWW calculation (on HECToR Phase 2a). The results are shown in table 2.
In all cases but one, both the Padded alltoall and SHM alltoallv methods are faster than the original alltoallv implementation. Note that for CNT80, the nscf step is split in two (see section 2) and the first part is dominated by linear algebra, rather than the FFT. In all cases the SHM alltoallv outperfoms the padded alltoall, so it is recommended that this method always be used on HECToR