The original work plan called for use of 2DECOMP&FFT's FFT API - that is, its `black-box' 3-D FFT routines. On investigation, it turned out that a lower-level approach was required. It is in fact more convenient to do so, because the original codes have been programmed in the expectation that optimisations involving intermediate arrays (that would be hidden by a 3-D FFT) are possible; removing these would mean writing new code that would probably be less efficient. The need to implement dealiasing is also an impediment.
Dealiasing is most efficiently performed by pruning the array at each parallel transposition. The order of operations eventually adopted is as follows (the initial letter of each item denotes the direction in which the domain is not decomposed, e.g. y implies -pencils containing all points in but only a subset in and ):
This significantly reduces the total data volume that requires transposition, relative to operating using a x x global domain size. For backward and forward transformation of a single variable, the cost is , rather than , a saving of 44 %. A general 3-D FFT supporting this approach would require information relating to the nature of the dealiasing to be performed in each direction (for instance, as noted in section 1 above, the rule is specific to equations with quadratic nonlinearity).
Note that , and above correspond to the notation used in SWT and SS3F, but not that of 2DECOMP&FFT. To translate, exchange and .
It proved reasonably straightforward to implement the above approach using the domain decomposition API of 2DECOMP&FFT, and the results of doing so are discussed in section 4 below.