Performance of Cfoam-CABARET

The production test case used to demonstrate the scalability of the unstructured CABARET code was a 3D backward facing step geometry with boundary conditions for laminar flow, Reynolds number=5000 and Mach number=0.1. The grid was $ 5.12 \times 10^7$ hexahedral cells and scalability was considered for 276 time steps without any I-O involved. On Phase 2b of HECToR (Cray XT6, 24 core magny-cours processors), the code demonstrated more than 80% parallel efficiency for up to 1000 cores (PGI 10.8.0). This was using 4 MPI tasks per node with 6 OpenMP threads and 2 MPI tasks with 12 OpenMP threads.

However, for a typical $ 5.12 \times 10^7$ cell Gambit generated grid, the time taken to read in a typical mesh was around 18mins and to write the Tecplot360 output file more than 25mins. The input data files are stored in ASCII although the checkpoint files are in binary and are therefore much faster to write, typically these take 10mins. But all timings increase linearly for more than 1000 cores. Hence a small part of this dCSE project was to improve the I-O in the existing code.

At the start of this project, the unstructured CABARET code evolved to Cfoam-CABARET. The main difference was that Gambit was no longer used to generate the initial input grid, instead OpenFOAM 1.7.1 had been adopted as both the unstructured grid generation tool and mechanism for parallel decomposition. The timings for reading in the input were improved by taking half as long as the single process method with the Gambit generated mesh, furthermore timings for output were also improved. But this was due to the fact that each process both reads in and writes out to separate data files (for both input and output). This is a consequence of the OpenFOAM I-O structure and a separate post-processing application called outFoamX has been developed to gather the separate output files and generate a single file for Tecplot360.

In more detail, for Cfoam-CABARET the input grid is stored such that the relevant part for each process is held in the directory processor* (where * indicates the number of the MPI process). Each file must be read in at the beginning of a simulation by each process. For restarts and visualisation, the files are written in processor* as results ResCells000000n and ResFaces000000n, where n is related to the time step. For restarts, ResCells000000n and ResFaces000000n will be read in from the relevant time step for each process. For post-processing outFoamX reads in ResCells000000n and ResFaces000000n, along with the OpenFOAM grid. The output from outFoamX is then a series of *.plt files which are then suitable for reading by Tecplot360.

It is worthwhile to note that although this method is faster than using a single process master I-O model, a very large number of post-processing files are required which in turn has potential to cause problems on the HECToR file system. E.g. for a typical simulation 1000's files are required for each MPI process. Therefore the implementation of an MPI-IO model for this data is still appropriate as this will reduce the number of files required by Cfoam-CABARET and for future simulations it will enable restarts / post-processing with different numbers of processors, if so required, and subject to the OpenFOAM input grid. The work related to this development is described in more detail in Section 4 of this report.

Phil Ridley 2012-10-01