The upgraded version of DSTAR can use approximately 50 times more MPI tasks and grid points than the pre-dCSE version with good efficiency (in fact we have not touched the limit of scalability as of yet, due to practicalities of running very large jobs on HECToR). The serial performance was improved significantly (approximately 20%), the parallel efficiency of the 2D decomposition has been improved by approximately 50%.
Both aspects (large scale parallelism and serial speed) are very important for the exploration of new physical regimes with DSTAR because the integration time step is controled by the micorscopical (chemical) time scale while the flow characteristics are determined by the macroscopic geometry and a much larger time scale.
The monitor data and debug or log messages are written to single files, a MPI-IO version for read/write of restart file has been provided which allows for flexibility, while the faster Fortran binary version was preserved.
The source code was reoganized into Fotran 90 modules, this makes the overall code structure clearer. In turn this helps to solve programing errors faster and offers the possibility to write simpler code from the top level subroutines. As an example of enhanced usability, we mention the case for the introduction of dynamical memory allocation for the data arrays which permits one to use the same executables for different grid sizes.
While working on the DSTAR code, it became apparent that even more could be done to improve DSTAR performance and usability.
As it was mentioned before, communication within the 2D domain decomposition consumes approximately 50% of the run time. A preliminary study shows that this can be decreased significantly by the introduction of a computation-communication overlap algorithm with the help of mixed mode programing. This concept could also be extended to the IO operations by creating a subset of IO nodes that execute IO operations concurrently with computation reserved for the other nodes.
From a usability perspective the following points will be considered for a future project: i) a structured input file that should provide information in three logical blocks: a) physical parameters, b) computational parameters and c) data collection parameters, ii) better user documentation for the input parameters and data post-processing, iii) an extended testing module, accessible as a Makefile task for validating new code developments and iv) an improved data collection with the help of the IO function from the 2DECOMP library.
Lucian Anton 2011-09-13