On the basis of the measurements summarised in section 2.1 above, it was considered that both SWT and SS3F would benefit from a modernisation of the FFT routines used, and conversion to a 2-D domain decomposition, to improve per-core performance and parallel scaling respectively.
The 2DECOMP&FFT library [12], [11] enables applications using a three-dimensional structured mesh to use a 2-D (pencil) domain decomposition, providing the parallel transpose operations required by some numerical methods (such as FFTs) to use such a decomposition. In addition, a higher-level interface (a `black-box' 3-D FFT) is provided, as are routines for parallel I/O and halo support (relevant to hybrid spectral-finite difference methods). These were not to be used in this project; both SWT and SS3F already use MPI-I/O to read and write restart files, and neither uses finite differences.
Although 2DECOMP&FFT supports several FFT routines, a decision was made to begin by converting the existing FFT routines to instead use the FFTW library [6]. This course of action was adopted in part because SWT requires Chebyshev transforms; although these can be implemented using FFT routines, 2DECOMP&FFT does not yet support these directly. They could, however, be implemented by using SWT's Chebyshev transforms and replacing the FFTs with those of an FFT library supported by 2DECOMP&FFT (one such is FFTW). In addition, it was reasoned that early conversion to FFTW would aid development work, by separating any possible slight numerical differences arising from the use of a different FFT library from those arising from parallelisation. Results from the existing codes (modified to use FFTW) could be compared with those obtained using the new versions in the knowledge that the underlying FFT library would be the same (provided, of course, that 2DECOMP&FFT was compiled to use FFTW).
The following work plan was therefore adopted: