The original work plan called for use of 2DECOMP&FFT's FFT API - that is, its `black-box' 3-D FFT routines. On investigation, it turned out that a lower-level approach was required. It is in fact more convenient to do so, because the original codes have been programmed in the expectation that optimisations involving intermediate arrays (that would be hidden by a 3-D FFT) are possible; removing these would mean writing new code that would probably be less efficient. The need to implement dealiasing is also an impediment.
Dealiasing is most efficiently performed by pruning the
array at each parallel transposition. The order of operations eventually adopted
is as follows (the initial letter of each item denotes the direction in which
the domain is not decomposed, e.g. y implies -pencils containing all points
in
but only a subset in
and
):
This significantly reduces the total data volume that requires transposition, relative
to operating using a x
x
global domain size. For backward and forward
transformation of a single variable, the cost is
, rather than
, a saving of 44 %. A general 3-D FFT supporting this approach
would require information
relating to the nature of the dealiasing to be performed in each direction (for instance,
as noted in section 1 above, the
rule is specific to equations with
quadratic nonlinearity).
Note that ,
and
above correspond to the notation used in SWT and SS3F, but
not that of 2DECOMP&FFT. To translate, exchange
and
.
It proved reasonably straightforward to implement the above approach using the domain decomposition API of 2DECOMP&FFT, and the results of doing so are discussed in section 4 below.