In this dCSE project, the primary goal was improve the scalability of the code by parallelising the I/O routines. To that end there were two objectives:

Currently, the code exhibits poor strong scaling: the input is performed in serial on the master process and the output is performed by each process but is serialised in a round-robin pattern. The effect on the overall performance of the code compared to the performance of the solver itself can be observed in Fig 1.
With respect the purely serial case (1 MPI process), the maximum speedup is limited to $\sim$22. The theoretical scalability limit for this code is the number of z-planes present in the problem as the decomposition is one dimensional. Throughout this report, we use the larger of the two test cases supplied by Prof. Fagan's group, the High res model which has 884 z-planes comprising $\sim$29 million elements.

Figure 1: Strong scaling of PARA-BMU using only serial I/O showing ideal case (Linear), solver only (Solver) and complete runtime (Total).
\includegraphics[width=0.88\textwidth, keepaspectratio]{serialscaling2.eps}

For backwards compatibility, it is obviously desirable to be able to re-use previous data files and for that reason, all serial I/O routines are preserved. In addition, convertor utilities are provided to allow conversion of the old ASCII data files to and from the new format.