From the perspective of simulating new physical regimes with DSTAR the main task of this project was to implement a two dimensional domain decomposition with the help of the 2DECOMP library described in Ref . The bulk of the work for this task was to adapt the local arrays which are used to store the local grid data, and a number of subroutines to the 2D domain decomposition. More details are presented in Sections 3, 5.
The new code was then tested with up to 18,432 MPI tasks and shows good scaling efficiency (approximately 50%), although the 80% efficiency target in WP1 was not reached. The original target was set from using scalability data for other fluid dynamics codes. However, when this target was set, the fact that DSTAR has a multicomponent nature was overlooked. The multicomponent aspect results in a very large amount of data communication in the two dimensional decomposition, hence the loss in efficiency with respect to other fluid dynamics models.
We have also pursued this matter further because at large MPI task counts the code spends approximately 50% of the run time doing communication. We have found that a significantly better scalability can be achieved with an algorithm that overlaps communication with computation using a Mixed Mode programming model (see Section 3, Ref ) but this is at the cost of a rather more complex source code.
The input/output (IO) operations were changed from a multiple file access to a single file access pattern. This replaces the original output mechanism which used a large number of small files on the HECToR parallel file system and was therefore very inefficient. In the new version, the restart file (of binary data) can be accessed with MPI-IO although we have found that good tuning of Fortran binary IO is also efficient as well, see Section 4. In addition to restart data, there is also separate IO for monitoring points regarding physical quantities. This data is now collected in one ASCII text file and handled by a designated MPI process. An auxiliary program has been written to sort out the data from the monitoring points into individual files for the required visualisation. The log mechanism was updated to the same IO model. Details of this work are presented in Sections 4, 5.
Code modernisation was pursued in two main directions: i) replacing static arrays with allocatable ones, this allows one to use larger grids sizes and to use one executable for arbitrary grid sizes; ii) group Fortran 77 subroutines in modules according to their functionality. This transformation has made the source code structure clearer. It also helps error location and offers the possibility to write more simple code for the top level subroutines. For better control over data structures implicit data types were replaced with explicit declarations within most of the source files. Optimisation work was also done on some subroutines for the fluid solver which brought approximately a 20% speedup for the core solver, more details are presented in Section 5.
Lucian Anton 2011-09-13