The original work plan called for use of 2DECOMP&FFT's FFT API - that is, its `black-box' 3-D FFT routines. On investigation, it turned out that a lower-level approach was required. It is in fact more convenient to do so, because the original codes have been programmed in the expectation that optimisations involving intermediate arrays (that would be hidden by a 3-D FFT) are possible; removing these would mean writing new code that would probably be less efficient. The need to implement dealiasing is also an impediment.

Dealiasing is most efficiently performed by pruning the
array at each parallel transposition. The order of operations eventually adopted
is as follows (the initial letter of each item denotes the direction in which
the domain is not decomposed, e.g. *y* implies -pencils containing all points
in but only a subset in and ):

*y*: In wave space - global domain size x x .*y*: SS3F - Fourier transform in , SWT - Chebyshev transform in .- Transpose to .
*z*: Expand domain size to x x .*z*: Fourier transform in .- Transpose to .
*x*: Expand domain size to x x .*x*: Fourier transform in .*x*: In real space - calculate non-linear terms of Navier-Stokes equations.*x*: Fourier transform in .*x*: Prune domain to x x .- Transpose to .
*z*: Fourier transform in .*z*: Prune domain to x x .- Transpose to .
*y*: SS3F - Fourier transform in ; SWT - Chebyshev transform in .*y*: If dealiasing in - zero high wavenumbers.

This significantly reduces the total data volume that requires transposition, relative to operating using a x x global domain size. For backward and forward transformation of a single variable, the cost is , rather than , a saving of 44 %. A general 3-D FFT supporting this approach would require information relating to the nature of the dealiasing to be performed in each direction (for instance, as noted in section 1 above, the rule is specific to equations with quadratic nonlinearity).

Note that , and above correspond to the notation used in SWT and SS3F, but
*not* that of 2DECOMP&FFT. To translate, exchange and .

It proved reasonably straightforward to implement the above approach using the domain decomposition API of 2DECOMP&FFT, and the results of doing so are discussed in section 4 below.