Thirdly, a similar approach to the SHM Send Buffers was implemented. However, to avoid the limitations of having non-portable code, and having to use a shared memory API which breaks the MPI Distributed Memory model, an alternative using MPI_Gatherv and MPI_Scatterv to collect data on the root node of each SMP was used. This follows a similar implementation in CASTEP. While similar in structure to the SHM implementation, an extra step is needed at both the packing and unpacking stage. Since all data sent from a given process to the root on its SMP node by MPI_Gatherv must be contiguous, it needs to be unpacked from this recieve buffer into the send buffer that will be used for the MPI_Alltoallv call that performs the global transpose. A corresponding operation is also needed after than MPI_Alltoallv, before the data is send back to each process in the SMP node using MPI_Scatterv.
Compiling the code using the Scatter/Gather Alltoallv requires the __FFT_LOCAL_COMM macro to be defined at compile time.