The HECToR Service is now closed and has been superceded by ARCHER.

Software Framework to Support Overlap of Communications and Computations in Implicit CFD Applications

This Distributed Computational Science and Engineering (dCSE) project concerns the continuing develop of Incompact3D, an incompressible Navier-Stokes solver. Incompact3D is used by the Turbulence, Mixing and Flow Control group at Imperial College and its academic collaborators to conduct state-of-the-art turbulence studies. The work follows on from the previously successful first and second Incompact3D projects. Originally the scalability of Incompact3D was significantly improved via the creation of the 2DECOMP&FFT library, which is a 2D pencil decomposition communication framework with a distributed FFT interface. Although 2DECOMP&FFT allows Incompact3D to regularly use more than 10,000 cores on HECToR for production runs, the time taken by the communications becomes more prominent at this number of cores, and there is still scope for further improvement.

The aim of this project is to reduce the time taken by the all-to-all communications in Incompact3D for large core counts by further developing the 2DECOMP&FFT framework to enable the overlap of communications and computations (OCC). This will not be straightforward because of the all-to-all communication pattern, the blocking implementations in present MPI libraries (based on the MPI-2.2 standard), and the general lack of mature software support. In particular, the following work will be performed:

  • Developing the support for a flexible data layout in 2DECOMP&FFT. The existing (standard) (i,j,k)-ordering for the 3D arrays will be updated to allow other orderings such as (i,k,j). This will enable wider support for legacy applications and it should also improve the performance of some numerical algorithms due to memory pattern changes that may offer better cache efficiency. Most importantly, the flexible data layout will facilitate fine-grain partitioning and reduce unnecessary memory copying.
  • Creating alternative FFT code for Incompact3D by using non-blocking MPI collectives. A black-box FFT solver will be created that internally uses fine-grain OCC to enable the code to automatically benefit from the non-blocking MPI collectives in MPI-3. Furthermore, a high-level API will be provided in 2DECOMP&FFT to allow multiple independent FFTs to be performed in one function call (this will internally use coarse-grain OCC). A set of low-level APIs will also be created to facilitate OCC in various other algorithms.

The outcome of this work may be summarised as follows:

  • A set of low-level API have been created in 2DECOMP&FFT to facilitate OCC in algorithms.
  • The effectiveness of the API above was demonstrated in an application of multiple independent FFTs, achieving a performance gain of up to 15%.
  • A high-level API has been provided to enable multiple independent distributed FFTs to run in a single subroutine call.
  • By building on top of the 2DECOMP&FFT library, the CFD code Incompact3D is now MPI-3 ready and should benefit from the new MPI library development in the future.
  • A sample application has been created to demonstrate the use of fine-grain OCC in 3D distributed FFTs.
  • Introduced more flexible data layout in the 2DECOMP&FFT library.
  • Updated 2DECOMP&FFT's I/O code to support additional data layouts.

Software developed within this project has been released wherever appropriate:

  • Version 1.5 of the 2DECOMP&FFT library was released in October 2012. This release has included a number of new APIs derived from this dCSE project.
  • The CFD code Incompact3D has been open-sourced since November 2012 as a Google project.

Please see PDF or HTML for a report which summarises this project.