next up previous
Next: Chebyshev Transforms Up: Implementation Previous: FFT conversion

2-D domain decomposition using the 2DECOMP&FFT library

The original work plan called for use of 2DECOMP&FFT's FFT API - that is, its `black-box' 3-D FFT routines. On investigation, it turned out that a lower-level approach was required. It is in fact more convenient to do so, because the original codes have been programmed in the expectation that optimisations involving intermediate arrays (that would be hidden by a 3-D FFT) are possible; removing these would mean writing new code that would probably be less efficient. The need to implement dealiasing is also an impediment.

Dealiasing is most efficiently performed by pruning the array at each parallel transposition. The order of operations eventually adopted is as follows (the initial letter of each item denotes the direction in which the domain is not decomposed, e.g. y implies $y$-pencils containing all points in $y$ but only a subset in $x$ and $z$):

$\Rightarrow$
y: In wave space - global domain size $N_x$ x $3 N_y / 2$ x $N_z$.
$\Rightarrow$
y: SS3F - Fourier transform in $y$, SWT - Chebyshev transform in $y$.
$\Rightarrow$
Transpose $y$ to $z$.
$\Rightarrow$
z: Expand domain size to $N_x$ x $3 N_y / 2$ x $3 N_z / 2$.
$\Rightarrow$
z: Fourier transform in $z$.
$\Rightarrow$
Transpose $z$ to $x$.
$\Rightarrow$
x: Expand domain size to $3 N_x / 2$ x $3 N_y / 2$ x $3 N_z / 2$.
$\Rightarrow$
x: Fourier transform in $x$.
$\Rightarrow$
x: In real space - calculate non-linear terms of Navier-Stokes equations.
$\Rightarrow$
x: Fourier transform in $x$.
$\Rightarrow$
x: Prune domain to $N_x$ x $3 N_y / 2$ x $3 N_z / 2$.
$\Rightarrow$
Transpose $x$ to $z$.
$\Rightarrow$
z: Fourier transform in $z$.
$\Rightarrow$
z: Prune domain to $N_x$ x $N_y$ x $3 N_z / 2$.
$\Rightarrow$
Transpose $z$ to $y$.
$\Rightarrow$
y: SS3F - Fourier transform in $y$; SWT - Chebyshev transform in $y$.
$\Rightarrow$
y: If dealiasing in $y$ - zero high wavenumbers.

This significantly reduces the total data volume that requires transposition, relative to operating using a $3 N_x / 2$ x $3 N_y / 2$ x $3 N_z / 2$ global domain size. For backward and forward transformation of a single variable, the cost is $3/2+9/4+9/4+3/2=15/2$, rather than $27/8+27/8+27/8+27/8 = 27/2$, a saving of 44 %. A general 3-D FFT supporting this approach would require information relating to the nature of the dealiasing to be performed in each direction (for instance, as noted in section 1 above, the $3/2$ rule is specific to equations with quadratic nonlinearity).

Note that $x$, $y$ and $z$ above correspond to the notation used in SWT and SS3F, but not that of 2DECOMP&FFT. To translate, exchange $y$ and $z$.

It proved reasonably straightforward to implement the above approach using the domain decomposition API of 2DECOMP&FFT, and the results of doing so are discussed in section 4 below.


next up previous
Next: Chebyshev Transforms Up: Implementation Previous: FFT conversion
R.Johnstone 2012-07-31