Porting and Optimisation of Code_Saturne on HECToR
The Computational Fluid Dynamics (CFD) software, Code_Saturne, has been under development since 1997 by EDF. The software is based on a collocated Finite Volume Method (FVM) that accepts three-dimensional meshes built with any type of cell (tetrahedral, hexahedral, prismatic, pyramidal, polyhedral) and with any type of grid structure (unstructured, block structured, hybrid). This allows Code_Saturne to model highly complex geometries. It can simulate either incompressible or compressible flows with or without heat transfer and turbulence. It was designed as a parallel code and uses pre-processing for mesh partitioning in order to produce the input files for the solver. The output is then post-processed and converted into readable files for visualization (e.g. for ParaView). Since 2007, Code_Saturne has been open-source.
The overall aims of this project were to:
- Improve the pre/post-processing for Code_Saturne to enable efficient scalability on up to 8192 processors on HECToR Phase 2a and beyond.
- Evaluate open-source mesh partitioning software and determine the best package which enables the main solver in Code_Saturne to run giving an optimal load balanced and communications efficient solution. And, also minimises the memory requirements and time taken for the partitioning.
The outcomes of the project are:
- The open-source mesh partitioning packages Metis 5.0pre, ParMetis 3.1.1, PT-Scotch 5.1 and Zoltan 3.0 were tested for efficiency with Code_Saturne.
- Metis is a sequential code and is therefore limited by memory requirements. To perform the 121M tetrahedral element simulation using Code_Saturne, the partition obtained using Metis consistently provided the best decomposition and required the least amount of wall-clock time for the simulation. However, the difference between the other packages was not found to be significant.
- In contrast, the memory constraints did vary with each package and PT-Scotch was able to generate mesh partitions in parallel (up to 131072 domains) using only 16 cores whereas ParMetis 3.1.1 required a minimum of 512 cores to create the 131072 domains. An analysis of the metrics suggests that the larger number of cores required by ParMetis results in a partition with a poor load balance. In practice, however, the simulation run time did not reflect this observation and, for up to 1024 cores, ParMetis produced the lower time to solution.
- Above 1024 cores, and up to 8192 cores, the sequential version of Metis showed the best speed-up. For 2048 and 4096 cores, PT-Scotch provided a better performance than ParMetis. Preliminary results using Zoltan did not demonstrate as good performance.
- Although ParMetis and PT-Scotch perform parallel mesh partitioning and are not limited by memory, in order to maintain the quality of the partition (edges cut, load balance), some extra monitoring may be required.
Please see PDF or HTML for a report which summarises this work.